Enterprise-Scale Anime Recommendation Engine

Processing 649M similarity pairs across 37k anime with 5GB on-disk matrix and sub-100ms API responses

37,030 Anime Coverage
649M+ Similarity Pairs
<100ms Hot API Response
500x Memory Efficiency
5GB Matrix Size (HDF5)
36x Performance Improvement

πŸ¦€ Pure Rust Engine Breakthrough

Achieved the seemingly impossible: processing 649,141,288 similarity pairs with streaming JSON architecture. The V2 engine eliminates Python-Rust data transfer bottlenecks entirely, delivering 36x performance improvement while maintaining constant memory usage through innovative streaming techniques.

V2 System Architecture

This enterprise-grade recommendation system represents a breakthrough in performance engineering. The V2 architecture combines a Pure Rust similarity engine with a FastAPI backend and React 19 frontend, processing 649 million similarity calculations across 37,030 anime entries.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Data Pipeline β”‚ β”‚ Rust Engine β”‚ β”‚ Real-time API β”‚
β”‚ (Python) │───▢│ (Pure Rust) │───▢│ (FastAPI) β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β€’ 37k anime β”‚ β”‚ β€’ 649M pairs β”‚ β”‚ β€’ <100ms hot β”‚
β”‚ β€’ Unified data β”‚ β”‚ β€’ Streaming JSON β”‚ β”‚ β€’ On-demand β”‚
β”‚ β€’ is_clean flag β”‚ β”‚ β€’ 5GB matrix β”‚ β”‚ β€’ 500x efficiencyβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The core innovation is the "On-Demand Reading + OS File Caching" principle: heavy computation happens offline in Rust, while the real-time API reads tiny slices of the pre-computed 5GB matrix as needed. The first request loads data into OS cache, making subsequent requests lightning fast.

V2 Key Innovations

  • Pure Rust Engine v1.2.0: Streaming JSON writer handles 649M pairs without memory explosion
  • Multi-Feature Algorithm: Weighted similarity combining tags (60%), score (20%), year (15%), type (5%)
  • "Filter Candidates First": 20x query performance by processing 200-1000 candidates vs 37k total
  • Hybrid Request Classification: Clean/full tiers for optimal performance/quality trade-offs
  • Enterprise Security: CSRF protection, structured logging, distributed rate limiting
  • Docker Containerization: Microservices-ready deployment with Redis integration

Performance Journey: From Failure to Breakthrough

Evolution Timeline

Implementation Runtime Outcome
Python Original 30+ minutes ❌ OOM Failures
Python-Rust Hybrid 18x penalty ❌ Data Transfer Overhead
Pure Rust v1.0 <2 minutes βœ… Success (1k anime)
Pure Rust v1.2 16 minutes βœ… BREAKTHROUGH (649M pairs)

Technology Stack

πŸ¦€ Rust Engine

  • Pure Rust 1.87.0
  • Rayon Parallel Processing
  • Streaming JSON Writer
  • Minimal Dependencies

⚑ Backend

  • FastAPI + Python
  • Dependency Injection
  • Structured Logging
  • Rate Limiting

🌐 Frontend

  • React 19
  • TypeScript
  • Styled Components
  • Modern UI/UX

🐳 Infrastructure

  • Docker Containerization
  • Redis Integration
  • HDF5 Data Format
  • Enterprise Security

Code Highlights

V2 Multi-Feature Similarity (Rust)

/// V2: Multi-feature weighted similarity calculation
/// Combines tag, score, year, and type similarities
fn compute_v2_multifeature_similarity(
    anime_a: &AnimeData,
    anime_b: &AnimeData,
    _index: &InvertedIndex
) -> f32 {
    // Feature weights based on V2 specification
    const TAG_WEIGHT: f32 = 0.60;
    const SCORE_WEIGHT: f32 = 0.20;
    const YEAR_WEIGHT: f32 = 0.15;
    const TYPE_WEIGHT: f32 = 0.05;

    // Weighted combination of multiple features
    let final_similarity =
        (TAG_WEIGHT * compute_weighted_tag_similarity(anime_a, anime_b)) +
        (SCORE_WEIGHT * (1.0 - (anime_a.normalized_score - anime_b.normalized_score).abs())) +
        (YEAR_WEIGHT * (1.0 - (anime_a.normalized_year - anime_b.normalized_year).abs())) +
        (TYPE_WEIGHT * anime_a.type_category.similarity(&anime_b.type_category));

    final_similarity.min(1.0).max(0.0) // Ensure bounds [0,1]
}

"Filter Candidates First" Algorithm (Python)

# Phase 1: Aggregate candidates (200-1000 vs 37k total)
candidate_ids = self._aggregate_candidate_ids(
    engine_inputs, algorithm, 1000
)

# Phase 2: Pre-filter by criteria (50-200 qualified)
qualified_candidates = self._pre_filter_candidates(
    candidate_ids, filters, exclude_ids,
    engine_inputs, request_type
)

# Phase 3: Compute similarities (targeted vs full matrix)
candidate_scores = self._compute_targeted_similarities(
    qualified_candidates, engine_inputs, algorithm
)

# Result: 20x performance improvement for complex queries

Engineering Challenges Solved

Memory Explosion Crisis: The original approach failed when attempting to serialize 649M similarity pairs. The breakthrough came with streaming JSON architecture - writing output piece by piece with constant 128KB memory usage.

Data Transfer Bottleneck: Python-Rust hybrid suffered 18x performance penalty due to serialization overhead. Pure Rust eliminated this entirely by handling data loading, computation, and output in a single runtime.

Real-time Performance: With 5GB matrices, loading everything into memory was impossible. The on-demand reading approach combined with OS file caching achieves sub-100ms responses after the first "warming" request.

Production Deployment

The system is fully containerized with Docker Compose, featuring:

  • Backend: FastAPI with dependency injection and structured logging
  • Frontend: React 19 with production-optimized Nginx serving
  • Database: Redis for rate limiting and caching
  • Security: CSRF protection, security headers, comprehensive error handling
  • Monitoring: Request tracking, performance metrics, health checks