Enterprise-Scale Anime Recommendation Engine
Processing 649M similarity pairs across 37k anime with 5GB on-disk matrix and sub-100ms API responses
π¦ Pure Rust Engine Breakthrough
Achieved the seemingly impossible: processing 649,141,288 similarity pairs with streaming JSON architecture. The V2 engine eliminates Python-Rust data transfer bottlenecks entirely, delivering 36x performance improvement while maintaining constant memory usage through innovative streaming techniques.
V2 System Architecture
This enterprise-grade recommendation system represents a breakthrough in performance engineering. The V2 architecture combines a Pure Rust similarity engine with a FastAPI backend and React 19 frontend, processing 649 million similarity calculations across 37,030 anime entries.
β Data Pipeline β β Rust Engine β β Real-time API β
β (Python) βββββΆβ (Pure Rust) βββββΆβ (FastAPI) β
β β β β β β
β β’ 37k anime β β β’ 649M pairs β β β’ <100ms hot β
β β’ Unified data β β β’ Streaming JSON β β β’ On-demand β
β β’ is_clean flag β β β’ 5GB matrix β β β’ 500x efficiencyβ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
The core innovation is the "On-Demand Reading + OS File Caching" principle: heavy computation happens offline in Rust, while the real-time API reads tiny slices of the pre-computed 5GB matrix as needed. The first request loads data into OS cache, making subsequent requests lightning fast.
V2 Key Innovations
- Pure Rust Engine v1.2.0: Streaming JSON writer handles 649M pairs without memory explosion
- Multi-Feature Algorithm: Weighted similarity combining tags (60%), score (20%), year (15%), type (5%)
- "Filter Candidates First": 20x query performance by processing 200-1000 candidates vs 37k total
- Hybrid Request Classification: Clean/full tiers for optimal performance/quality trade-offs
- Enterprise Security: CSRF protection, structured logging, distributed rate limiting
- Docker Containerization: Microservices-ready deployment with Redis integration
Performance Journey: From Failure to Breakthrough
Evolution Timeline
Implementation | Runtime | Outcome |
---|---|---|
Python Original | 30+ minutes | β OOM Failures |
Python-Rust Hybrid | 18x penalty | β Data Transfer Overhead |
Pure Rust v1.0 | <2 minutes | β Success (1k anime) |
Pure Rust v1.2 | 16 minutes | β BREAKTHROUGH (649M pairs) |
Technology Stack
π¦ Rust Engine
- Pure Rust 1.87.0
- Rayon Parallel Processing
- Streaming JSON Writer
- Minimal Dependencies
β‘ Backend
- FastAPI + Python
- Dependency Injection
- Structured Logging
- Rate Limiting
π Frontend
- React 19
- TypeScript
- Styled Components
- Modern UI/UX
π³ Infrastructure
- Docker Containerization
- Redis Integration
- HDF5 Data Format
- Enterprise Security
Code Highlights
V2 Multi-Feature Similarity (Rust)
/// Combines tag, score, year, and type similarities
fn compute_v2_multifeature_similarity(
anime_a: &AnimeData,
anime_b: &AnimeData,
_index: &InvertedIndex
) -> f32 {
// Feature weights based on V2 specification
const TAG_WEIGHT: f32 = 0.60;
const SCORE_WEIGHT: f32 = 0.20;
const YEAR_WEIGHT: f32 = 0.15;
const TYPE_WEIGHT: f32 = 0.05;
// Weighted combination of multiple features
let final_similarity =
(TAG_WEIGHT * compute_weighted_tag_similarity(anime_a, anime_b)) +
(SCORE_WEIGHT * (1.0 - (anime_a.normalized_score - anime_b.normalized_score).abs())) +
(YEAR_WEIGHT * (1.0 - (anime_a.normalized_year - anime_b.normalized_year).abs())) +
(TYPE_WEIGHT * anime_a.type_category.similarity(&anime_b.type_category));
final_similarity.min(1.0).max(0.0) // Ensure bounds [0,1]
}
"Filter Candidates First" Algorithm (Python)
candidate_ids = self._aggregate_candidate_ids(
engine_inputs, algorithm, 1000
)
# Phase 2: Pre-filter by criteria (50-200 qualified)
qualified_candidates = self._pre_filter_candidates(
candidate_ids, filters, exclude_ids,
engine_inputs, request_type
)
# Phase 3: Compute similarities (targeted vs full matrix)
candidate_scores = self._compute_targeted_similarities(
qualified_candidates, engine_inputs, algorithm
)
# Result: 20x performance improvement for complex queries
Engineering Challenges Solved
Memory Explosion Crisis: The original approach failed when attempting to serialize 649M similarity pairs. The breakthrough came with streaming JSON architecture - writing output piece by piece with constant 128KB memory usage.
Data Transfer Bottleneck: Python-Rust hybrid suffered 18x performance penalty due to serialization overhead. Pure Rust eliminated this entirely by handling data loading, computation, and output in a single runtime.
Real-time Performance: With 5GB matrices, loading everything into memory was impossible. The on-demand reading approach combined with OS file caching achieves sub-100ms responses after the first "warming" request.
Production Deployment
The system is fully containerized with Docker Compose, featuring:
- Backend: FastAPI with dependency injection and structured logging
- Frontend: React 19 with production-optimized Nginx serving
- Database: Redis for rate limiting and caching
- Security: CSRF protection, security headers, comprehensive error handling
- Monitoring: Request tracking, performance metrics, health checks