AI Models Overview

This section documents the AI models used at each stage of the Kingisoovitaja gift recommendation pipeline, their selection rationale, and performance characteristics.

Model Strategy

The system uses a multi-model approach optimizing for:

Speed: Sub-second response times
Cost: Efficient resource utilization
Quality: Accurate recommendations
Reliability: Consistent outputs

Pipeline Phases

Model Comparison

Phase	Model	Provider	Latency	Cost	Type
Phase 0	LLaMA 4 Scout 17B	Groq	fast	Low	LLM
Phase 1-2	Rule-based	-	<50ms	Zero	Deterministic
Phase 3	LLaMA 4 Scout 17B	Groq	~150ms	Low	LLM
Phase 4	Heuristic	-	<10ms	Zero	Deterministic
Phase 5	GPT-5.1	OpenAI	very fast	Medium	LLM

Cost Analysis

Per 1000 Requests:

Component	Model	Estimated Cost
Context Extraction	LLaMA 4 Scout	$0.50
Reranking	LLaMA 4 Scout	$0.30
Generation	GPT-5.1	$2.00
Total		$2.80

Performance Targets

Environment Configuration

All models are configured via environment variables:

# Provider API Keys
GROQ_API_KEY=gsk_...           # For LLaMA models
OPENAI_API_KEY=sk-...          # For GPT models

# Model Selection (optional overrides)
CONTEXT_MODEL=llama-4-scout-17b-16e-instruct
GENERATION_MODEL=gpt-5.1-chat-latest

# Performance Tuning
CONTEXT_TIMEOUT_MS=2000        # Max wait for context
GENERATION_MAX_TOKENS=2500     # Response length limit

Phase Documentation

Detailed documentation for each phase:

Phase 0: Context Detection - Intent extraction and parsing
Phase 1-2: Query & Search - Query generation and execution
Phase 3: Semantic Rerank - LLM-based relevance scoring
Phase 4: Diversity Selection - Heuristic product selection
Phase 5: Response Generation - AI-powered narration

Quick Reference

When to Use Which Model

Pipeline Overview - How models fit into pipeline
Orchestration System - System architecture

Model Strategy​

Pipeline Phases​

Model Comparison​

Cost Analysis​

Performance Targets​

Environment Configuration​

Phase Documentation​

Quick Reference​

When to Use Which Model​

Related Documentation​