AI Models Overview
This section documents the AI models used at each stage of the Kingisoovitaja gift recommendation pipeline, their selection rationale, and performance characteristics.
Model Strategy
The system uses a multi-model approach optimizing for:
- Speed: Sub-second response times
- Cost: Efficient resource utilization
- Quality: Accurate recommendations
- Reliability: Consistent outputs
Pipeline Phases
Model Comparison
| Phase | Model | Provider | Latency | Cost | Type |
|---|---|---|---|---|---|
| Phase 0 | LLaMA 4 Scout 17B | Groq | fast | Low | LLM |
| Phase 1-2 | Rule-based | - | <50ms | Zero | Deterministic |
| Phase 3 | LLaMA 4 Scout 17B | Groq | ~150ms | Low | LLM |
| Phase 4 | Heuristic | - | <10ms | Zero | Deterministic |
| Phase 5 | GPT-5.1 | OpenAI | very fast | Medium | LLM |
Cost Analysis
Per 1000 Requests:
| Component | Model | Estimated Cost |
|---|---|---|
| Context Extraction | LLaMA 4 Scout | $0.50 |
| Reranking | LLaMA 4 Scout | $0.30 |
| Generation | GPT-5.1 | $2.00 |
| Total | $2.80 |
Performance Targets
Environment Configuration
All models are configured via environment variables:
# Provider API Keys
GROQ_API_KEY=gsk_... # For LLaMA models
OPENAI_API_KEY=sk-... # For GPT models
# Model Selection (optional overrides)
CONTEXT_MODEL=llama-4-scout-17b-16e-instruct
GENERATION_MODEL=gpt-5.1-chat-latest
# Performance Tuning
CONTEXT_TIMEOUT_MS=2000 # Max wait for context
GENERATION_MAX_TOKENS=2500 # Response length limit
Phase Documentation
Detailed documentation for each phase:
- Phase 0: Context Detection - Intent extraction and parsing
- Phase 1-2: Query & Search - Query generation and execution
- Phase 3: Semantic Rerank - LLM-based relevance scoring
- Phase 4: Diversity Selection - Heuristic product selection
- Phase 5: Response Generation - AI-powered narration
Quick Reference
When to Use Which Model
Related Documentation
- Pipeline Overview - How models fit into pipeline
- Orchestration System - System architecture