Home | kingisoovitaja Documentation

Agentic RAG Architecture

Autonomous decision-making with intelligent orchestration across retrieval, augmentation, and generation layers

Augmentation

Autonomous context orchestration with intelligent routing (fast-path vs deep extraction), conversation memory resolution, and adaptive strategy selection based on query complexity

LLaMA 4 Scout · Groq

→

Retrieval

Intelligent multi-stage funnel (100→50→20→3) with adaptive search strategies, semantic vector matching, LLM-powered gift appropriateness scoring, and diversity-aware selection

Convex · Cohere Rerank v3.5

→

Generation

Context-aware streaming generation with parallel warmup, dynamic product injection during narration, smart category suggestions, and conversation state persistence for seamless multi-turn intelligence

GPT-5.1 · OpenAI

Key Features

Built for performance, intelligence, and user experience

⚡

Parallel Execution

Non-blocking architecture with immediate skeleton response. Dramatically faster time to first content with parallel processing.

🧠

Context Preservation

Intelligent followup handling with “show more” support, smart exclusion, and conversation memory

🔍

Semantic Search

Multi-stage funnel with vector search, LLM reranking, diversity selection, and gender affinity boost

💬

Multi-Language

Native support for Estonian and English with language-aware search and cultural relevance scoring

🎯

Author Resolution

Multi-stage author detection, pronoun resolution with LLM fallback, and disambiguation handling

🔄

Graceful Fallback

Pool exhaustion detection, automatic retry strategies, and transparent acknowledgments to users

Technology Stack

Frontend

Next.js 14+ with TypeScript
Motion.dev for animations
Real-time streaming UI

Backend

Vercel AI SDK
Convex real-time database
Vector embeddings

AI Models

LLaMA 8B (context extraction)
GPT-5.1 (generation)
Cohere Rerank v3.5

Kingisoovitaja

📖 Important Documentation