Skip to main content

kingisoovitaja - Agentic RAG System

Welcome to the kingisoovitaja documentation - a sophisticated Agentic RAG (Retrieval-Augmented Generation) system that demonstrates autonomous decision-making, intelligent orchestration, and adaptive strategy selection for conversational gift discovery.


What is kingisoovitaja?

kingisoovitaja (Gift Advisor) is an intelligent conversational shopping assistant that goes far beyond traditional search. It's an autonomous agent that:

  • Decides its own strategy based on query complexity (fast-path vs deep extraction)
  • Maintains conversation memory across turns (authors, preferences, context)
  • Adapts retrieval tactics based on ambiguity and user signals
  • Self-corrects through multi-layered fallback strategies
  • Orchestrates multiple LLMs for specialized tasks

Unlike rule-based chatbots or simple RAG systems, kingisoovitaja exhibits true agency through autonomous decision-making at every layer.


What Makes This "Agentic"?

Autonomous Intelligence vs Traditional Systems

Traditional RAG: Linear, rule-based, no memory
Agentic RAG: Branching decisions, adaptive strategies, stateful intelligence


The Five Pillars of Agency

1. Autonomous Routing & Strategy Selection

The system decides its own execution path based on query analysis:

Key Decisions Made Autonomously:

  • Fast-path vs enhanced extraction (~60-70% use fast-path)
  • Skip classifier for author/pronoun queries (prevents hijacking)
  • Route to specialized handlers based on intent + confidence
  • Early return vs fall-through based on classification quality

No hardcoded rules - The agent analyzes each query and chooses the optimal path.


2. Stateful Memory & Context Building

The system actively builds and maintains conversation state:

Autonomous Behaviors:

  • Proactive state fetching - Retrieves context BEFORE extraction (not after)
  • Intelligent state building - Constructs "quick conversation state" with primaryAuthor
  • Selective persistence - Saves authorName, productType, category, excludeIds
  • State injection - Passes conversation state to LLM for pronoun resolution
  • Memory-based fallbacks - Uses conversation memory when LLM fails

The agent manages its own memory - No manual state management required.


3. Multi-Stage Adaptive Retrieval

The system orchestrates complex search strategies autonomously:

Adaptive Decisions:

  • Multi-query vs single-query based on ambiguity
  • Constraint application based on context signals
  • Reranking vs skip based on result diversity
  • Pool expansion on poor quality results
  • Gender affinity boost when recipient gender known

The agent adjusts retrieval strategy based on real-time quality assessment.


4. Self-Correcting & Fallback Intelligence

The system monitors its own performance and self-corrects:

Autonomous Monitoring:

  • Tracks classifier confidence → Falls back if low
  • Detects missing context → Asks for clarification
  • Monitors search results → Activates zero-results handler
  • Checks pronoun resolution → Uses memory if LLM fails
  • Pool exhaustion detection → Transparent acknowledgment to user

The agent never crashes - It always finds a graceful path forward.


5. Intelligent Orchestration Across Components

The system coordinates multiple subsystems with autonomous scheduling:

Orchestration Decisions:

  • Parallel warmup vs sequential based on mode
  • When to fetch stored context (before vs after extraction)
  • Handler selection based on intent + confidence thresholds
  • Search strategy (multi-query vs single) based on ambiguity
  • Reranking necessity based on result diversity
  • Smart suggestion generation based on occasion appropriateness

The agent orchestrates timing, dependencies, and execution flow autonomously.


Sophisticated Agentic RAG Architecture


The Three-Layer Architecture

Layer 1: Augmentation (Query Understanding)

Not just extraction - Intelligent orchestration with autonomous routing

Autonomous Behaviors:

  • Analyzes query for author/pronoun patterns before classification
  • Chooses extraction strategy based on pattern detection
  • Proactively fetches conversation state when needed
  • Self-decides when to inject memory into LLM context

Components:

Models: LLaMA 4 Scout 17B (Groq) - Ultra-fast context extraction


Not just vector search - Multi-stage intelligent filtering with quality monitoring

Autonomous Decisions:

  • Query strategy selection (multi vs single vs author-filtered)
  • Constraint application based on context signals
  • Reranking necessity based on result diversity
  • Category balancing for better exploration
  • Gender affinity boost when recipient gender detected

Components:

  • Vector Search - Semantic embedding similarity
  • Multi-Stage Funnel - Progressive filtering (100 → 50 → 20 → 3-5)
  • LLM Reranking - Cohere Rerank v3.5 for gift appropriateness
  • Diversity Selection - Autonomous category and price distribution

Layer 3: Generation (Context-Aware Creation)

Not just text generation - Intelligent response orchestration with product injection

Autonomous Orchestration:

  • Parallel warmup scheduling (overlap with search)
  • Dynamic product injection timing (during AI narration)
  • Smart suggestion generation (occasion-filtered)
  • Gift wrap cross-sell insertion (when appropriate)
  • Conversation state persistence (selective fields)

Components:

  • GPT-5.1 Streaming - High-quality narration
  • Product Card Injection - Real-time insertion
  • Smart Suggestions - Context-aware recommendations
  • State Persistence - Autonomous memory management

Key Differentiators from Traditional RAG

AspectTraditional RAGkingisoovitaja Agentic RAG
Decision MakingHardcoded rulesAutonomous runtime decisions
RoutingSingle pathIntelligent routing with 5+ decision points
MemoryStatelessStateful with proactive state fetching
SearchSimple retrievalMulti-stage adaptive funnel with quality monitoring
FallbacksError pagesMulti-layered self-correction
ContextPer-requestCumulative across conversation
OptimizationFixedAdaptive (fast-path for 60-70% of queries)
Author HandlingKeyword matchMulti-stage detection + pronoun resolution
SuggestionsRandomOccasion-aware, context-filtered
QualityHope for best5-layer autonomous quality checks

Real-World Example: The Agent in Action

Scenario: Multi-Turn Author Discovery

8 Autonomous Decisions Made:

  1. Skip classifier for author pattern
  2. Use fresh extraction (no prior state)
  3. Apply author filter to search
  4. Generate smart suggestions based on context
  5. Persist authorName for future use
  6. Fetch conversation state for pronoun query
  7. Inject state into LLM for resolution
  8. Search with exclusions to avoid duplicates

No hardcoded flow - The agent adapts to each query dynamically.


System Architecture

Complete Flow with Agent Decisions

17 Autonomous Decisions per request - This is what makes it "Agentic"!


Performance Through Intelligence

How Agent Decisions Improve Performance

Agent-Driven Optimizations:

  • Fast-path for simple queries (250ms vs 500ms)
  • Parallel warmup (overlap context + warmup)
  • Skip reranking when diversity good (save 200ms)
  • Skeleton response (perceived under 100ms)

Total Impact: Sub-second responses through intelligent decision-making


Autonomous Quality Assurance

Self-Monitoring Quality Agent

5 Quality Agents making autonomous decisions at each layer.


Core Capabilities

1. Multi-Language Intelligence

  • Estonian (Primary) - Morphological case handling, compound words, cultural context
  • English - Full support with automatic detection
  • Code-Switching - Handles mixed queries seamlessly

Estonian Agent Capabilities:

  • Dative case: "sõbrale" → agent extracts "sõber"
  • Genitive: "Kingi teosed" → agent extracts "King"
  • Ablative: "Tolkienilt" → agent extracts "Tolkien"
  • Compound words: "lauamäng" → agent detects "board game"
  • Diacritics: "Kivirähkilt" → agent handles ä, ö, ü, õ

2. Author Resolution Intelligence

100% success rate for direct author queries through multi-stage agent decisions:

  • Stage 1: Pattern detection (routing agent)
  • Stage 2: LLM extraction with 9 few-shot examples
  • Stage 3: Validation agent (clean pronouns, validate names)
  • Stage 4: Memory fallback (0ms lookup if LLM fails)
  • Stage 5: Persistence agent (save for future pronoun resolution)

Learn more: Author Intent


3. Smart Suggestions with Occasion Intelligence

The suggestion agent autonomously:

  • Filters inappropriate suggestions (no birthday cards for housewarmings)
  • Prevents cross-product-type leakage (no film suggestions for books)
  • Enforces diversity (no duplicate categories)
  • Injects gift wrap cross-sell (when appropriate)
  • Provides zero-results safety net (explores all categories)

Learn more: Smart Suggestions System


4. Context Preservation Agent

Manages exclude lists and context inheritance autonomously:

  • Perfect deduplication across "show more" requests
  • Automatic context reset detection (topic changes)
  • Merges client and server exclude lists intelligently
  • Preserves taxonomy (productType, category, author, budget)

Learn more: Show More Behavior


Documentation Structure

Start Here (Essential Reading)

  1. Fast Classifier - Autonomous fast-path routing
  2. Context Extraction - Intent detection agent
  3. Smart Suggestions System - Suggestion generation agent
  4. Author Intent - Multi-stage author resolution

Architecture

Context & Intelligence

Pipeline & Streaming

AI Models & Decisions

Smart Agents

Conversational Intelligence

Prompts & Configuration

Quality & Guardrails


Quick Start

To run the development server:

cd documentation
npm start

The documentation will be available at http://localhost:3000.


Technology Stack

  • Frontend: Next.js 15 with TypeScript
  • Backend: Vercel AI SDK + Convex
  • AI Models:
    • LLaMA 4 Scout 17B (Groq) - Context extraction
    • GPT-5.1 (OpenAI) - Response generation
    • Cohere Rerank v3.5 - Semantic reranking
  • Database: Convex (real-time backend with vector search)
  • Animation: Motion.dev (60 FPS performance)

Performance Metrics

MetricAchievement
Time To First ContentSub-second response through autonomous parallel orchestration
Context ExtractionFast semantic understanding with intelligent routing decisions
Search PipelineOptimized multi-stage retrieval with adaptive strategy selection
Show More PreservationExcellent context retention through autonomous state management
Agent Decisions Per Request17+ autonomous decisions for optimal user experience

Observability

Enable debug logging to see agent decisions:

export CHAT_DEBUG_LOGS=true

Logs show:

  • Agent routing decisions
  • Classifier skip reasons
  • State persistence choices
  • Performance optimizations
  • Fallback activations

Example log:

  SKIPPING FAST CLASSIFIER FOR AUTHOR QUERY: {
reason: 'explicit-author-pattern',
hasAuthorPattern: true,
willUseEnhancedLLM: true,
query: 'raamatuid Tolkienilt'
}

Contributing

This is an internal documentation site. For updates or corrections, contact the engineering team.


Last Updated: November 2025
Version: 2.0 - Agentic RAG Emphasis
Maintained By: kingisoovitaja Engineering Team