kingisoovitaja - Agentic RAG System

Welcome to the kingisoovitaja documentation - a sophisticated Agentic RAG (Retrieval-Augmented Generation) system that demonstrates autonomous decision-making, intelligent orchestration, and adaptive strategy selection for conversational gift discovery.

What is kingisoovitaja?

kingisoovitaja (Gift Advisor) is an intelligent conversational shopping assistant that goes far beyond traditional search. It's an autonomous agent that:

Decides its own strategy based on query complexity (fast-path vs deep extraction)
Maintains conversation memory across turns (authors, preferences, context)
Adapts retrieval tactics based on ambiguity and user signals
Self-corrects through multi-layered fallback strategies
Orchestrates multiple LLMs for specialized tasks

Unlike rule-based chatbots or simple RAG systems, kingisoovitaja exhibits true agency through autonomous decision-making at every layer.

What Makes This "Agentic"?

Autonomous Intelligence vs Traditional Systems

Traditional RAG: Linear, rule-based, no memory
Agentic RAG: Branching decisions, adaptive strategies, stateful intelligence

The Five Pillars of Agency

1. Autonomous Routing & Strategy Selection

The system decides its own execution path based on query analysis:

Key Decisions Made Autonomously:

Fast-path vs enhanced extraction (~60-70% use fast-path)
Skip classifier for author/pronoun queries (prevents hijacking)
Route to specialized handlers based on intent + confidence
Early return vs fall-through based on classification quality

No hardcoded rules - The agent analyzes each query and chooses the optimal path.

2. Stateful Memory & Context Building

The system actively builds and maintains conversation state:

Autonomous Behaviors:

Proactive state fetching - Retrieves context BEFORE extraction (not after)
Intelligent state building - Constructs "quick conversation state" with primaryAuthor
Selective persistence - Saves authorName, productType, category, excludeIds
State injection - Passes conversation state to LLM for pronoun resolution
Memory-based fallbacks - Uses conversation memory when LLM fails

The agent manages its own memory - No manual state management required.

3. Multi-Stage Adaptive Retrieval

The system orchestrates complex search strategies autonomously:

Adaptive Decisions:

Multi-query vs single-query based on ambiguity
Constraint application based on context signals
Reranking vs skip based on result diversity
Pool expansion on poor quality results
Gender affinity boost when recipient gender known

The agent adjusts retrieval strategy based on real-time quality assessment.

4. Self-Correcting & Fallback Intelligence

The system monitors its own performance and self-corrects:

Autonomous Monitoring:

Tracks classifier confidence → Falls back if low
Detects missing context → Asks for clarification
Monitors search results → Activates zero-results handler
Checks pronoun resolution → Uses memory if LLM fails
Pool exhaustion detection → Transparent acknowledgment to user

The agent never crashes - It always finds a graceful path forward.

5. Intelligent Orchestration Across Components

The system coordinates multiple subsystems with autonomous scheduling:

Orchestration Decisions:

Parallel warmup vs sequential based on mode
When to fetch stored context (before vs after extraction)
Handler selection based on intent + confidence thresholds
Search strategy (multi-query vs single) based on ambiguity
Reranking necessity based on result diversity
Smart suggestion generation based on occasion appropriateness

The agent orchestrates timing, dependencies, and execution flow autonomously.

Sophisticated Agentic RAG Architecture

The Three-Layer Architecture

Layer 1: Augmentation (Query Understanding)

Not just extraction - Intelligent orchestration with autonomous routing

Autonomous Behaviors:

Analyzes query for author/pronoun patterns before classification
Chooses extraction strategy based on pattern detection
Proactively fetches conversation state when needed
Self-decides when to inject memory into LLM context

Components:

Fast Classifier - Autonomous fast-path decision-making
Enhanced Extraction - Deep semantic understanding
Author Resolution - Multi-stage detection with fallbacks
Memory Resolution - Stateful conversation management

Models: LLaMA 4 Scout 17B (Groq) - Ultra-fast context extraction

Layer 2: Retrieval (Adaptive Smart Search)

Not just vector search - Multi-stage intelligent filtering with quality monitoring

Autonomous Decisions:

Query strategy selection (multi vs single vs author-filtered)
Constraint application based on context signals
Reranking necessity based on result diversity
Category balancing for better exploration
Gender affinity boost when recipient gender detected

Components:

Vector Search - Semantic embedding similarity
Multi-Stage Funnel - Progressive filtering (100 → 50 → 20 → 3-5)
LLM Reranking - Cohere Rerank v3.5 for gift appropriateness
Diversity Selection - Autonomous category and price distribution

Layer 3: Generation (Context-Aware Creation)

Not just text generation - Intelligent response orchestration with product injection

Autonomous Orchestration:

Parallel warmup scheduling (overlap with search)
Dynamic product injection timing (during AI narration)
Smart suggestion generation (occasion-filtered)
Gift wrap cross-sell insertion (when appropriate)
Conversation state persistence (selective fields)

Components:

GPT-5.1 Streaming - High-quality narration
Product Card Injection - Real-time insertion
Smart Suggestions - Context-aware recommendations
State Persistence - Autonomous memory management

Key Differentiators from Traditional RAG

Aspect	Traditional RAG	kingisoovitaja Agentic RAG
Decision Making	Hardcoded rules	Autonomous runtime decisions
Routing	Single path	Intelligent routing with 5+ decision points
Memory	Stateless	Stateful with proactive state fetching
Search	Simple retrieval	Multi-stage adaptive funnel with quality monitoring
Fallbacks	Error pages	Multi-layered self-correction
Context	Per-request	Cumulative across conversation
Optimization	Fixed	Adaptive (fast-path for 60-70% of queries)
Author Handling	Keyword match	Multi-stage detection + pronoun resolution
Suggestions	Random	Occasion-aware, context-filtered
Quality	Hope for best	5-layer autonomous quality checks

Real-World Example: The Agent in Action

Scenario: Multi-Turn Author Discovery

8 Autonomous Decisions Made:

Skip classifier for author pattern
Use fresh extraction (no prior state)
Apply author filter to search
Generate smart suggestions based on context
Persist authorName for future use
Fetch conversation state for pronoun query
Inject state into LLM for resolution
Search with exclusions to avoid duplicates

No hardcoded flow - The agent adapts to each query dynamically.

System Architecture

Complete Flow with Agent Decisions

17 Autonomous Decisions per request - This is what makes it "Agentic"!

Performance Through Intelligence

How Agent Decisions Improve Performance

Agent-Driven Optimizations:

Fast-path for simple queries (250ms vs 500ms)
Parallel warmup (overlap context + warmup)
Skip reranking when diversity good (save 200ms)
Skeleton response (perceived under 100ms)

Total Impact: Sub-second responses through intelligent decision-making

Autonomous Quality Assurance

Self-Monitoring Quality Agent

5 Quality Agents making autonomous decisions at each layer.

Core Capabilities

1. Multi-Language Intelligence

Estonian (Primary) - Morphological case handling, compound words, cultural context
English - Full support with automatic detection
Code-Switching - Handles mixed queries seamlessly

Estonian Agent Capabilities:

Dative case: "sõbrale" → agent extracts "sõber"
Genitive: "Kingi teosed" → agent extracts "King"
Ablative: "Tolkienilt" → agent extracts "Tolkien"
Compound words: "lauamäng" → agent detects "board game"
Diacritics: "Kivirähkilt" → agent handles ä, ö, ü, õ

2. Author Resolution Intelligence

100% success rate for direct author queries through multi-stage agent decisions:

Stage 1: Pattern detection (routing agent)
Stage 2: LLM extraction with 9 few-shot examples
Stage 3: Validation agent (clean pronouns, validate names)
Stage 4: Memory fallback (0ms lookup if LLM fails)
Stage 5: Persistence agent (save for future pronoun resolution)

Learn more: Author Intent

3. Smart Suggestions with Occasion Intelligence

The suggestion agent autonomously:

Filters inappropriate suggestions (no birthday cards for housewarmings)
Prevents cross-product-type leakage (no film suggestions for books)
Enforces diversity (no duplicate categories)
Injects gift wrap cross-sell (when appropriate)
Provides zero-results safety net (explores all categories)

Learn more: Smart Suggestions System

4. Context Preservation Agent

Manages exclude lists and context inheritance autonomously:

Perfect deduplication across "show more" requests
Automatic context reset detection (topic changes)
Merges client and server exclude lists intelligently
Preserves taxonomy (productType, category, author, budget)

Learn more: Show More Behavior

Documentation Structure

Start Here (Essential Reading)

Fast Classifier - Autonomous fast-path routing
Context Extraction - Intent detection agent
Smart Suggestions System - Suggestion generation agent
Author Intent - Multi-stage author resolution

Architecture

High-Level Architecture - Complete system overview
Parallel Orchestrator - Performance optimization agent

Context & Intelligence

GiftContext & Followup System - Context building agent
Intent Classification - Dual-path decision-making

Pipeline & Streaming

Pipeline Overview - Frontend streaming architecture
Lifecycle & Flow - Request-response orchestration

AI Models & Decisions

Model Overview - Model selection strategy
Performance Comparison - Benchmarks
Phase 0: Context - Intent extraction
Phase 5: Generation - Response creation

Smart Agents

Smart Suggestions System - Complete suggestion intelligence
Show More Behavior - Pagination agent

Conversational Intelligence

Conversational Overview - 7-layer intelligence stack
Author Intent - Author detection agent
Refinement Detection - Query refinement agent
Progressive Context - Multi-turn clarification
Budget System - Budget detection intelligence
Memory Resolution - Conversation memory agent

Prompts & Configuration

Prompts Overview - Prompt architecture
Estonian Prompt - Estonian language rules
Response Validation - Anti-hallucination agent

Quality & Guardrails

Quality Overview - Multi-agent quality system
Estonian Challenges - Language-specific solutions

Quick Start

To run the development server:

cd documentation
npm start

The documentation will be available at http://localhost:3000.

Technology Stack

Frontend: Next.js 15 with TypeScript
Backend: Vercel AI SDK + Convex
AI Models:
- LLaMA 4 Scout 17B (Groq) - Context extraction
- GPT-5.1 (OpenAI) - Response generation
- Cohere Rerank v3.5 - Semantic reranking
Database: Convex (real-time backend with vector search)
Animation: Motion.dev (60 FPS performance)

Performance Metrics

Metric	Achievement
Time To First Content	Sub-second response through autonomous parallel orchestration
Context Extraction	Fast semantic understanding with intelligent routing decisions
Search Pipeline	Optimized multi-stage retrieval with adaptive strategy selection
Show More Preservation	Excellent context retention through autonomous state management
Agent Decisions Per Request	17+ autonomous decisions for optimal user experience

Observability

Enable debug logging to see agent decisions:

export CHAT_DEBUG_LOGS=true

Logs show:

Agent routing decisions
Classifier skip reasons
State persistence choices
Performance optimizations
Fallback activations

Example log:

  SKIPPING FAST CLASSIFIER FOR AUTHOR QUERY: {
  reason: 'explicit-author-pattern',
  hasAuthorPattern: true,
  willUseEnhancedLLM: true,
  query: 'raamatuid Tolkienilt'
}

Contributing

This is an internal documentation site. For updates or corrections, contact the engineering team.

Last Updated: November 2025
Version: 2.0 - Agentic RAG Emphasis
Maintained By: kingisoovitaja Engineering Team

What is kingisoovitaja?​

What Makes This "Agentic"?​

Autonomous Intelligence vs Traditional Systems​

The Five Pillars of Agency​

1. Autonomous Routing & Strategy Selection​

2. Stateful Memory & Context Building​

3. Multi-Stage Adaptive Retrieval​

4. Self-Correcting & Fallback Intelligence​

5. Intelligent Orchestration Across Components​

Sophisticated Agentic RAG Architecture​

The Three-Layer Architecture​

Layer 1: Augmentation (Query Understanding)​

Layer 2: Retrieval (Adaptive Smart Search)​

Layer 3: Generation (Context-Aware Creation)​

Key Differentiators from Traditional RAG​

Real-World Example: The Agent in Action​

Scenario: Multi-Turn Author Discovery​

System Architecture​

Complete Flow with Agent Decisions​

Performance Through Intelligence​

How Agent Decisions Improve Performance​

Autonomous Quality Assurance​

Self-Monitoring Quality Agent​

Core Capabilities​

1. Multi-Language Intelligence​

2. Author Resolution Intelligence​

3. Smart Suggestions with Occasion Intelligence​

4. Context Preservation Agent​

Documentation Structure​

Start Here (Essential Reading)​

Architecture​

Context & Intelligence​

Pipeline & Streaming​

AI Models & Decisions​

Smart Agents​

Conversational Intelligence​

Prompts & Configuration​

Quality & Guardrails​

Quick Start​

Technology Stack​

Performance Metrics​

Observability​

Contributing​

What is kingisoovitaja?

What Makes This "Agentic"?

Autonomous Intelligence vs Traditional Systems

The Five Pillars of Agency

1. Autonomous Routing & Strategy Selection

2. Stateful Memory & Context Building

3. Multi-Stage Adaptive Retrieval

4. Self-Correcting & Fallback Intelligence

5. Intelligent Orchestration Across Components

Sophisticated Agentic RAG Architecture

The Three-Layer Architecture

Layer 1: Augmentation (Query Understanding)

Layer 2: Retrieval (Adaptive Smart Search)

Layer 3: Generation (Context-Aware Creation)

Key Differentiators from Traditional RAG

Real-World Example: The Agent in Action

Scenario: Multi-Turn Author Discovery

System Architecture

Complete Flow with Agent Decisions

Performance Through Intelligence

How Agent Decisions Improve Performance

Autonomous Quality Assurance

Self-Monitoring Quality Agent

Core Capabilities

1. Multi-Language Intelligence

2. Author Resolution Intelligence

3. Smart Suggestions with Occasion Intelligence

4. Context Preservation Agent

Documentation Structure

Start Here (Essential Reading)

Architecture

Context & Intelligence

Pipeline & Streaming

AI Models & Decisions

Smart Agents

Conversational Intelligence

Prompts & Configuration

Quality & Guardrails

Quick Start

Technology Stack

Performance Metrics

Observability

Contributing