Orchestration System Architecture

AI-Powered Gift Recommendation System
Multi-layered orchestration architecture for intelligent product search and recommendation

System Overview
Architecture Diagram
Core Components
Execution Flows
Performance Optimizations
Configuration & Toggles
Data Flow Examples

System Overview

The orchestration system is a sophisticated multi-layered architecture that coordinates the entire gift recommendation pipeline, from user query to final AI response. It consists of four primary orchestrators, multiple handlers, and specialized services that work together to deliver sub-second response times while maintaining high-quality recommendations.

Key Features

Parallel Execution: Optimized flow with <800ms TTFC (Time To First Chunk)
Context-Aware Search: Multi-stage filtering with semantic reranking
Intelligent Routing: Intent-based handler selection
Streaming Responses: Real-time product card injection
Graceful Degradation: Fallbacks at every critical junction

Performance Metrics

Metric	Parallel Mode	Sequential Mode
TTFC	<800ms	~slow
Context Extraction	fast (non-blocking)	~1.9s (blocking)
Search Pipeline	optimized	optimized
AI First Chunk	very fast (skeleton)	~slow

Architecture Diagram

High-Level System Architecture

Detailed Orchestration Flow

Handler Routing Decision Tree

Core Components

1. ParallelOrchestrator

Purpose: Optimized execution flow that parallelizes context extraction with user feedback

Key Features:

Non-blocking context extraction
Immediate skeleton response (<100ms)
Dynamic product injection during streaming
Query validation before processing
Intent-based routing with fallbacks

Performance Impact:

dramatic improvement in TTFC (slow → <800ms)
User sees response in <100ms (skeleton)
AI starts streaming at ~1s
Products appear at ~1.4s

Main Responsibilities:

Query Validation (nonsense detection)
Context Extraction (parallel, non-blocking)
Vague Intent Detection (multi-factor)
Search Orchestration (when needed)
Response Streaming (with delayed cards)
Context Persistence

Critical Logic:

Vague Query Detection: Combines confidence, signals, and explicit mentions
Book Fallback: Automatic category broadening for gift queries
Memory Resolution: Check stored products before search
Repetition Detection: Stop streaming on AI loops

2. ContextOrchestrator

Purpose: Extract, enrich, and manage conversation context

Sub-Components:

extract-context.ts - LLM-based intent extraction
fetch-stored-context.ts - Retrieve conversation history
refinement-signals.ts - Apply user feedback signals
author-workflow.ts - Author detection and clarification
exclude-reset.ts - Smart exclude list management
product-inquiry.ts - Follow-up question routing
persist-context.ts - Save context to database

Context Preservation: Preserves taxonomy for follow-up intents:

show_more_products
cheaper_alternatives
budget_alternatives

Key Features:

Multi-source context merging (LLM + DB + Client)
Exclude list pruning (max 30 items)
Category hints prioritization (frontend → DB)
Budget constraint preservation
Author clarification workflow

3. SearchOrchestrator

Purpose: Coordinate the complete search pipeline from query to final products

Search Pipeline (6 Phases):

Phase 1: Query Rewriting

Generate query variations (primary + fallbacks)
Apply focus strategies (semantic, category, type)
Handle show_more special case

Phase 2: Multi-Stage Funnel

Stage A: Initial filtering (max 100 candidates)
Stage B: Budget & constraint filtering (max 50)
Stage C: Category distribution (max 20 finalists)

Phase 3: LLM Semantic Reranking

Cohere rerank-v3.5 scoring
User intent alignment
Quality-based filtering (0.5 threshold, fallback 0.3)

Phase 4: Diversity Selection

Category diversity
Price range distribution
Product type balancing
Final 3 selection

Phase 4.5: Gender Affinity Boost

Category-gender affinity scoring
Boost multiplier: 0.5x - 1.8x
Re-sort after boosting

Phase 6: Estonian Product Prioritization

Language-based boosting
Cultural relevance scoring

Fallback Mechanisms:

Book-only results: Auto-retry with gift categories
Language fallback: Retry without language filter
Gift card exclusion: EXCLUDE_GIFT_CARDS constraint
Quality safety net: Minimum threshold 0.3

4. ResponseOrchestrator

Purpose: Generate and stream AI responses with dynamic product injection

Key Features:

GPT-5.1 chat model (gpt-5.1-chat-latest)
Delayed card injection (@180 chars)
Token usage monitoring (2500 token limit)
Repetition detection (consecutive & frequent)
Fallback responses on failures

Response Modes:

Product Response (generateWithDelayedCards):
- Stream AI text first
- Inject product cards after 180 chars
- Include safety prefaces
- Add smart suggestions
- Track performance metrics
Conversational Response (generateConversationalResponse):
- No products, no skeleton
- Greeting/clarification handling
- Smart suggestion buttons
- Prompt compliance validation
Product Inquiry Response (generateProductInquiryResponse):
- Answer follow-up questions
- Use stored product data
- No new search

Quality Controls:

Product mention detection (validation)
Repetition detection (3+ consecutive words)
Token limit warnings (>90% utilization)
Cut-off handling (graceful ellipsis)

Execution Flows

Parallel Flow (Optimized)

Sequential Flow (Legacy)

Context Orchestration Detail

Search Orchestration Pipeline

Performance Optimizations

1. Parallel Execution Mode

Problem: Sequential context extraction blocked user feedback for ~1.9s

Solution: Parallel orchestration with immediate skeleton response

Benefits:

TTFC: slow → <800ms (dramatic improvement)
User perception: Instant feedback
Context extraction: Non-blocking

Implementation:

// Old (Sequential)
context = await ContextOrchestrator.orchestrate() // 1.9s BLOCKING
search = await SearchOrchestrator.orchestrate()
response = await ResponseOrchestrator.generate()

// New (Parallel)
sendSkeleton() // 50ms
Promise.all([
  contextPromise,  // 900ms non-blocking
  searchPrepPromise // 100ms
])
streamResponseImmediately() // &lt;800ms TTFC
injectProductsDynamically()

2. Context Warmup

OpenAI connection pre-warming
LLM model caching
Database connection pooling

3. Search Pipeline Optimizations

Stage Limits (Configured via SearchOrchestratorConfig):

MAX_CANDIDATES_STAGE_A = 100  // Down from 200
MAX_CANDIDATES_STAGE_B = 50   // Down from 100
MAX_FINALISTS = 20            // Down from 30
RERANK_MIN_FINALISTS = 3      // Skip rerank if < 3

Savings: ~200-300ms per request

4. Exclude List Pruning

Problem: Long conversations exhaust product pool

Solution: Keep only last 30 excludes (FIFO)

if (excludeIds.length > 30) {
  excludeIds = excludeIds.slice(-30)
}

5. Smart Quality Fallbacks

Preferred Threshold: 0.5 (high quality)
Minimum Threshold: 0.3 (fallback)

if (highQualityProducts.length < 3) {
  return mediumQualityProducts // Fallback
}

6. Repetition Detection

Stops streaming if AI loops:

Consecutive: 3+ same words in a row
Frequent: 3+ occurrences in 20-word window

7. Token Limit Monitoring

Max tokens: 2500
Warning at 90% utilization
Graceful cut-off handling

Configuration & Toggles

Environment Variables

# Execution Mode
PARALLEL_EXECUTION_ENABLE=true  # false = sequential (legacy)

# Context Management
PHASE5_CONTEXT_ENABLE=true      # Enable context persistence

# Debug & Logging
CHAT_DEBUG_LOGS=true            # Verbose logging
NODE_ENV=production             # Production/development

# AI Models
OPENAI_API_KEY=sk-...           # GPT-5.1 API key

# Database
NEXT_PUBLIC_CONVEX_URL=https://...  # Convex backend

Search Orchestrator Config

File: orchestrators/search-orchestrator.config.ts

export class SearchOrchestratorConfig {
  // Phase Toggles
  static PHASE2_ENABLED = true;  // Multi-stage funnel
  static PHASE3_ENABLED = true;  // LLM reranking
  static PHASE4_ENABLED = true;  // Diversity selection
  static PHASE6_ENABLED = true;  // Estonian boost

  // Stage Limits (Performance Tuning)
  static MAX_CANDIDATES_STAGE_A = 100;
  static MAX_CANDIDATES_STAGE_B = 50;
  static MAX_FINALISTS = 20;
  static MAX_PER_CATEGORY = 5;

  // Quality Thresholds
  static PREFERRED_QUALITY_THRESHOLD = 0.5;
  static MINIMUM_QUALITY_THRESHOLD = 0.3;
  static RERANK_MIN_FINALISTS = 3;

  // Diagnostics
  static DIAGNOSTICS_ENABLED = false;
  static AUTHOR_SPLIT_REGEX = /[,;]/;
  static SHOW_MORE_REGEX = /\b(näita\s+rohkem|show\s+more|veel|more)\b/i;
}

Response Configuration

File: app/chat/config.ts

export const chatConfig = {
  productDescriptions: {
    maxWords: 250,              // Max words per response
    sentencesPerProduct: 3,     // Sentences per product description
  }
}

Data Flow Examples

Example 1: Show More Products

Example 2: Vague Query with Clarification

Example 3: Author Clarification Workflow

Appendix: Key Files

Orchestrators

app/api/chat/orchestrators/parallel-orchestrator.ts - Optimized flow
app/api/chat/orchestrators/context-orchestrator/orchestrate.ts - Context extraction
app/api/chat/orchestrators/search-orchestrator.ts - Search pipeline
app/api/chat/orchestrators/response-orchestrator.ts - AI response generation

Handlers

app/api/chat/handlers/handler-router.ts - Intent-based routing
app/api/chat/handlers/product-search-handler.ts - Product search flow
app/api/chat/handlers/clarifying-question-handler.ts - Clarification flow
app/api/chat/handlers/conversational-handler.ts - Conversational flow

Services

app/api/chat/services/query-rewriting/ - Query generation
app/api/chat/services/product-search.ts - Multi-search execution
app/api/chat/services/funnel.ts - Multi-stage filtering
app/api/chat/services/rerank.ts - Semantic reranking
app/api/chat/services/diversity.ts - Final selection
app/api/chat/services/language.ts - Estonian boost

Configuration

app/api/chat/orchestrators/search-orchestrator.config.ts - Search config
app/chat/config.ts - Response config

Glossary

Term	Definition
TTFC	Time To First Chunk - Time until user sees first AI response
Context Orchestration	Extract and manage conversation state
Search Orchestration	Multi-phase product search pipeline
Response Orchestration	AI response generation and streaming
Parallel Execution	Non-blocking context extraction with immediate feedback
Sequential Execution	Blocking context extraction before streaming
Funnel	Multi-stage candidate filtering (Stage A → B → C)
Reranking	LLM-based semantic scoring for relevance
Diversity Selection	Category and price distribution balancing
Skeleton	Empty product card placeholders for instant feedback
Delayed Cards	Product injection after AI text starts streaming
Context Preservation	Taxonomy inheritance for follow-up queries
Exclude List	Previously shown product IDs to avoid duplicates
Smart Suggestions	Category buttons for quick navigation

Last Updated: 2025-11-16
Version: 1.0
Maintainer: AI Orchestration Team

Table of Contents​

System Overview​

Key Features​

Performance Metrics​

Architecture Diagram​

High-Level System Architecture​

Detailed Orchestration Flow​

Handler Routing Decision Tree​

Core Components​

1. ParallelOrchestrator​

2. ContextOrchestrator​

3. SearchOrchestrator​

Phase 1: Query Rewriting​

Phase 2: Multi-Stage Funnel​

Phase 3: LLM Semantic Reranking​

Phase 4: Diversity Selection​

Phase 4.5: Gender Affinity Boost​

Phase 6: Estonian Product Prioritization​

4. ResponseOrchestrator​

Execution Flows​

Parallel Flow (Optimized)​

Sequential Flow (Legacy)​

Context Orchestration Detail​

Search Orchestration Pipeline​

Performance Optimizations​

1. Parallel Execution Mode​

2. Context Warmup​

3. Search Pipeline Optimizations​

4. Exclude List Pruning​

5. Smart Quality Fallbacks​

6. Repetition Detection​

7. Token Limit Monitoring​

Configuration & Toggles​

Environment Variables​

Search Orchestrator Config​

Response Configuration​

Data Flow Examples​

Example 1: Show More Products​

Example 2: Vague Query with Clarification​

Example 3: Author Clarification Workflow​

Appendix: Key Files​

Orchestrators​

Handlers​

Services​

Configuration​

Glossary​

Table of Contents

System Overview

Key Features

Performance Metrics

Architecture Diagram

High-Level System Architecture

Detailed Orchestration Flow

Handler Routing Decision Tree

Core Components

1. ParallelOrchestrator

2. ContextOrchestrator

3. SearchOrchestrator

Phase 1: Query Rewriting

Phase 2: Multi-Stage Funnel

Phase 3: LLM Semantic Reranking

Phase 4: Diversity Selection

Phase 4.5: Gender Affinity Boost

Phase 6: Estonian Product Prioritization

4. ResponseOrchestrator

Execution Flows

Parallel Flow (Optimized)

Sequential Flow (Legacy)

Context Orchestration Detail

Search Orchestration Pipeline

Performance Optimizations

1. Parallel Execution Mode

2. Context Warmup

3. Search Pipeline Optimizations

4. Exclude List Pruning

5. Smart Quality Fallbacks

6. Repetition Detection

7. Token Limit Monitoring

Configuration & Toggles

Environment Variables

Search Orchestrator Config

Response Configuration

Data Flow Examples

Example 1: Show More Products

Example 2: Vague Query with Clarification

Example 3: Author Clarification Workflow

Appendix: Key Files

Orchestrators

Handlers

Services

Configuration

Glossary