Building a Production-Scale Recommendation Engine with Rust

Building a recommendation engine that can handle millions of similarity comparisons while maintaining sub-100ms response times is no small feat. In this article, I'll share my journey of creating a production-scale anime recommendation engine using Rust, and the architectural decisions that made it possible.

The Challenge

When I set out to build this system, I faced several key requirements:

Process 649 million similarity pairs across 37,000 anime titles
Maintain response times under 100 milliseconds
Handle concurrent requests efficiently
Keep memory usage reasonable despite the large dataset

649M Similarity Pairs

37K Anime Titles

<100ms Response Time

18x Performance Gain

Why Rust?

Rust was the natural choice for this project for several reasons:

Zero-cost abstractions: High-level code without runtime overhead
Memory safety: No null pointer exceptions or data races
Performance: Comparable to C/C++ with better safety guarantees
Concurrency: Fearless concurrency with compile-time guarantees

Architecture Overview

The system follows a microservices architecture with three main components:

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   React     │────▶│   FastAPI    │────▶│   Rust      │
│  Frontend   │     │   Backend    │     │   Engine    │
└─────────────┘     └──────────────┘     └─────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │    Redis     │
                    │    Cache     │
                    └──────────────┘

Streaming JSON Architecture

One of the key innovations was implementing a streaming JSON parser. Instead of loading the entire 649M similarity pairs into memory, the Rust engine streams data directly from disk, processing only what's needed for each request.

Key Insight: By using streaming JSON parsing, we reduced memory usage from over 12GB to less than 500MB while maintaining sub-100ms response times.

Optimization Techniques

Several optimization techniques contributed to the final performance:

1. Memory-Mapped Files

Using memory-mapped files allowed the OS to handle caching efficiently, reducing I/O overhead.

2. Redis Caching

Popular recommendations are cached in Redis, providing instant responses for frequently requested anime.

3. Parallel Processing

Rust's rayon library enables parallel processing of similarity calculations without the complexity of manual thread management.

Lessons Learned

Building this system taught me several valuable lessons:

Profile first, optimize later: Many assumptions about bottlenecks were wrong
Architecture matters: The streaming approach was more impactful than micro-optimizations
Rust's learning curve pays off: Initial development was slower, but maintenance is easier
Real-world testing is essential: Synthetic benchmarks don't capture all edge cases

Conclusion

Building a production-scale recommendation engine with Rust demonstrated that performance and developer experience don't have to be mutually exclusive. By choosing the right architecture and leveraging Rust's strengths, we achieved an 18x performance improvement while maintaining code quality and safety.

If you're interested in the technical details or want to see the code, check out the full project page or visit the GitHub repository.

← Back to All Blogs