kingisoovitaja - Agentic RAG System
Welcome to the kingisoovitaja documentation - a sophisticated Agentic RAG (Retrieval-Augmented Generation) system that demonstrates autonomous decision-making, intelligent orchestration, and adaptive strategy selection for conversational gift discovery.
What is kingisoovitaja?
kingisoovitaja (Gift Advisor) is an intelligent conversational shopping assistant that goes far beyond traditional search. It's an autonomous agent that:
- Decides its own strategy based on query complexity (fast-path vs deep extraction)
- Maintains conversation memory across turns (authors, preferences, context)
- Adapts retrieval tactics based on ambiguity and user signals
- Self-corrects through multi-layered fallback strategies
- Orchestrates multiple LLMs for specialized tasks
Unlike rule-based chatbots or simple RAG systems, kingisoovitaja exhibits true agency through autonomous decision-making at every layer.
What Makes This "Agentic"?
Autonomous Intelligence vs Traditional Systems
Traditional RAG: Linear, rule-based, no memory
Agentic RAG: Branching decisions, adaptive strategies, stateful intelligence
The Five Pillars of Agency
1. Autonomous Routing & Strategy Selection
The system decides its own execution path based on query analysis:
Key Decisions Made Autonomously:
- Fast-path vs enhanced extraction (~60-70% use fast-path)
- Skip classifier for author/pronoun queries (prevents hijacking)
- Route to specialized handlers based on intent + confidence
- Early return vs fall-through based on classification quality
No hardcoded rules - The agent analyzes each query and chooses the optimal path.
2. Stateful Memory & Context Building
The system actively builds and maintains conversation state:
Autonomous Behaviors:
- Proactive state fetching - Retrieves context BEFORE extraction (not after)
- Intelligent state building - Constructs "quick conversation state" with primaryAuthor
- Selective persistence - Saves authorName, productType, category, excludeIds
- State injection - Passes conversation state to LLM for pronoun resolution
- Memory-based fallbacks - Uses conversation memory when LLM fails
The agent manages its own memory - No manual state management required.
3. Multi-Stage Adaptive Retrieval
The system orchestrates complex search strategies autonomously:
Adaptive Decisions:
- Multi-query vs single-query based on ambiguity
- Constraint application based on context signals
- Reranking vs skip based on result diversity
- Pool expansion on poor quality results
- Gender affinity boost when recipient gender known
The agent adjusts retrieval strategy based on real-time quality assessment.
4. Self-Correcting & Fallback Intelligence
The system monitors its own performance and self-corrects:
Autonomous Monitoring:
- Tracks classifier confidence → Falls back if low
- Detects missing context → Asks for clarification
- Monitors search results → Activates zero-results handler
- Checks pronoun resolution → Uses memory if LLM fails
- Pool exhaustion detection → Transparent acknowledgment to user
The agent never crashes - It always finds a graceful path forward.
5. Intelligent Orchestration Across Components
The system coordinates multiple subsystems with autonomous scheduling:
Orchestration Decisions:
- Parallel warmup vs sequential based on mode
- When to fetch stored context (before vs after extraction)
- Handler selection based on intent + confidence thresholds
- Search strategy (multi-query vs single) based on ambiguity
- Reranking necessity based on result diversity
- Smart suggestion generation based on occasion appropriateness
The agent orchestrates timing, dependencies, and execution flow autonomously.
Sophisticated Agentic RAG Architecture
The Three-Layer Architecture
Layer 1: Augmentation (Query Understanding)
Not just extraction - Intelligent orchestration with autonomous routing
Autonomous Behaviors:
- Analyzes query for author/pronoun patterns before classification
- Chooses extraction strategy based on pattern detection
- Proactively fetches conversation state when needed
- Self-decides when to inject memory into LLM context
Components:
- Fast Classifier - Autonomous fast-path decision-making
- Enhanced Extraction - Deep semantic understanding
- Author Resolution - Multi-stage detection with fallbacks
- Memory Resolution - Stateful conversation management
Models: LLaMA 4 Scout 17B (Groq) - Ultra-fast context extraction
Layer 2: Retrieval (Adaptive Smart Search)
Not just vector search - Multi-stage intelligent filtering with quality monitoring
Autonomous Decisions:
- Query strategy selection (multi vs single vs author-filtered)
- Constraint application based on context signals
- Reranking necessity based on result diversity
- Category balancing for better exploration
- Gender affinity boost when recipient gender detected
Components:
- Vector Search - Semantic embedding similarity
- Multi-Stage Funnel - Progressive filtering (100 → 50 → 20 → 3-5)
- LLM Reranking - Cohere Rerank v3.5 for gift appropriateness
- Diversity Selection - Autonomous category and price distribution
Layer 3: Generation (Context-Aware Creation)
Not just text generation - Intelligent response orchestration with product injection
Autonomous Orchestration:
- Parallel warmup scheduling (overlap with search)
- Dynamic product injection timing (during AI narration)
- Smart suggestion generation (occasion-filtered)
- Gift wrap cross-sell insertion (when appropriate)
- Conversation state persistence (selective fields)
Components:
- GPT-5.1 Streaming - High-quality narration
- Product Card Injection - Real-time insertion
- Smart Suggestions - Context-aware recommendations
- State Persistence - Autonomous memory management
Key Differentiators from Traditional RAG
| Aspect | Traditional RAG | kingisoovitaja Agentic RAG |
|---|---|---|
| Decision Making | Hardcoded rules | Autonomous runtime decisions |
| Routing | Single path | Intelligent routing with 5+ decision points |
| Memory | Stateless | Stateful with proactive state fetching |
| Search | Simple retrieval | Multi-stage adaptive funnel with quality monitoring |
| Fallbacks | Error pages | Multi-layered self-correction |
| Context | Per-request | Cumulative across conversation |
| Optimization | Fixed | Adaptive (fast-path for 60-70% of queries) |
| Author Handling | Keyword match | Multi-stage detection + pronoun resolution |
| Suggestions | Random | Occasion-aware, context-filtered |
| Quality | Hope for best | 5-layer autonomous quality checks |
Real-World Example: The Agent in Action
Scenario: Multi-Turn Author Discovery
8 Autonomous Decisions Made:
- Skip classifier for author pattern
- Use fresh extraction (no prior state)
- Apply author filter to search
- Generate smart suggestions based on context
- Persist authorName for future use
- Fetch conversation state for pronoun query
- Inject state into LLM for resolution
- Search with exclusions to avoid duplicates
No hardcoded flow - The agent adapts to each query dynamically.
System Architecture
Complete Flow with Agent Decisions
17 Autonomous Decisions per request - This is what makes it "Agentic"!
Performance Through Intelligence
How Agent Decisions Improve Performance
Agent-Driven Optimizations:
- Fast-path for simple queries (250ms vs 500ms)
- Parallel warmup (overlap context + warmup)
- Skip reranking when diversity good (save 200ms)
- Skeleton response (perceived under 100ms)
Total Impact: Sub-second responses through intelligent decision-making
Autonomous Quality Assurance
Self-Monitoring Quality Agent
5 Quality Agents making autonomous decisions at each layer.
Core Capabilities
1. Multi-Language Intelligence
- Estonian (Primary) - Morphological case handling, compound words, cultural context
- English - Full support with automatic detection
- Code-Switching - Handles mixed queries seamlessly
Estonian Agent Capabilities:
- Dative case: "sõbrale" → agent extracts "sõber"
- Genitive: "Kingi teosed" → agent extracts "King"
- Ablative: "Tolkienilt" → agent extracts "Tolkien"
- Compound words: "lauamäng" → agent detects "board game"
- Diacritics: "Kivirähkilt" → agent handles ä, ö, ü, õ
2. Author Resolution Intelligence
100% success rate for direct author queries through multi-stage agent decisions:
- Stage 1: Pattern detection (routing agent)
- Stage 2: LLM extraction with 9 few-shot examples
- Stage 3: Validation agent (clean pronouns, validate names)
- Stage 4: Memory fallback (0ms lookup if LLM fails)
- Stage 5: Persistence agent (save for future pronoun resolution)
Learn more: Author Intent
3. Smart Suggestions with Occasion Intelligence
The suggestion agent autonomously:
- Filters inappropriate suggestions (no birthday cards for housewarmings)
- Prevents cross-product-type leakage (no film suggestions for books)
- Enforces diversity (no duplicate categories)
- Injects gift wrap cross-sell (when appropriate)
- Provides zero-results safety net (explores all categories)
Learn more: Smart Suggestions System
4. Context Preservation Agent
Manages exclude lists and context inheritance autonomously:
- Perfect deduplication across "show more" requests
- Automatic context reset detection (topic changes)
- Merges client and server exclude lists intelligently
- Preserves taxonomy (productType, category, author, budget)
Learn more: Show More Behavior
Documentation Structure
Start Here (Essential Reading)
- Fast Classifier - Autonomous fast-path routing
- Context Extraction - Intent detection agent
- Smart Suggestions System - Suggestion generation agent
- Author Intent - Multi-stage author resolution
Architecture
- High-Level Architecture - Complete system overview
- Parallel Orchestrator - Performance optimization agent
Context & Intelligence
- GiftContext & Followup System - Context building agent
- Intent Classification - Dual-path decision-making
Pipeline & Streaming
- Pipeline Overview - Frontend streaming architecture
- Lifecycle & Flow - Request-response orchestration
AI Models & Decisions
- Model Overview - Model selection strategy
- Performance Comparison - Benchmarks
- Phase 0: Context - Intent extraction
- Phase 5: Generation - Response creation
Smart Agents
- Smart Suggestions System - Complete suggestion intelligence
- Show More Behavior - Pagination agent
Conversational Intelligence
- Conversational Overview - 7-layer intelligence stack
- Author Intent - Author detection agent
- Refinement Detection - Query refinement agent
- Progressive Context - Multi-turn clarification
- Budget System - Budget detection intelligence
- Memory Resolution - Conversation memory agent
Prompts & Configuration
- Prompts Overview - Prompt architecture
- Estonian Prompt - Estonian language rules
- Response Validation - Anti-hallucination agent
Quality & Guardrails
- Quality Overview - Multi-agent quality system
- Estonian Challenges - Language-specific solutions
Quick Start
To run the development server:
cd documentation
npm start
The documentation will be available at http://localhost:3000.
Technology Stack
- Frontend: Next.js 15 with TypeScript
- Backend: Vercel AI SDK + Convex
- AI Models:
- LLaMA 4 Scout 17B (Groq) - Context extraction
- GPT-5.1 (OpenAI) - Response generation
- Cohere Rerank v3.5 - Semantic reranking
- Database: Convex (real-time backend with vector search)
- Animation: Motion.dev (60 FPS performance)
Performance Metrics
| Metric | Achievement |
|---|---|
| Time To First Content | Sub-second response through autonomous parallel orchestration |
| Context Extraction | Fast semantic understanding with intelligent routing decisions |
| Search Pipeline | Optimized multi-stage retrieval with adaptive strategy selection |
| Show More Preservation | Excellent context retention through autonomous state management |
| Agent Decisions Per Request | 17+ autonomous decisions for optimal user experience |
Observability
Enable debug logging to see agent decisions:
export CHAT_DEBUG_LOGS=true
Logs show:
- Agent routing decisions
- Classifier skip reasons
- State persistence choices
- Performance optimizations
- Fallback activations
Example log:
SKIPPING FAST CLASSIFIER FOR AUTHOR QUERY: {
reason: 'explicit-author-pattern',
hasAuthorPattern: true,
willUseEnhancedLLM: true,
query: 'raamatuid Tolkienilt'
}
Contributing
This is an internal documentation site. For updates or corrections, contact the engineering team.
Last Updated: November 2025
Version: 2.0 - Agentic RAG Emphasis
Maintained By: kingisoovitaja Engineering Team