Skip to main content

AI Models Overview

This section documents the AI models used at each stage of the Kingisoovitaja gift recommendation pipeline, their selection rationale, and performance characteristics.

Model Strategy

The system uses a multi-model approach optimizing for:

  • Speed: Sub-second response times
  • Cost: Efficient resource utilization
  • Quality: Accurate recommendations
  • Reliability: Consistent outputs

Pipeline Phases

Model Comparison

PhaseModelProviderLatencyCostType
Phase 0LLaMA 4 Scout 17BGroqfastLowLLM
Phase 1-2Rule-based-<50msZeroDeterministic
Phase 3LLaMA 4 Scout 17BGroq~150msLowLLM
Phase 4Heuristic-<10msZeroDeterministic
Phase 5GPT-5.1OpenAIvery fastMediumLLM

Cost Analysis

Per 1000 Requests:

ComponentModelEstimated Cost
Context ExtractionLLaMA 4 Scout$0.50
RerankingLLaMA 4 Scout$0.30
GenerationGPT-5.1$2.00
Total$2.80

Performance Targets

Environment Configuration

All models are configured via environment variables:

# Provider API Keys
GROQ_API_KEY=gsk_... # For LLaMA models
OPENAI_API_KEY=sk-... # For GPT models

# Model Selection (optional overrides)
CONTEXT_MODEL=llama-4-scout-17b-16e-instruct
GENERATION_MODEL=gpt-5.1-chat-latest

# Performance Tuning
CONTEXT_TIMEOUT_MS=2000 # Max wait for context
GENERATION_MAX_TOKENS=2500 # Response length limit

Phase Documentation

Detailed documentation for each phase:

Quick Reference

When to Use Which Model