Parallel Orchestrator

Overview

The Parallel Orchestrator is the performance-optimized request handler that achieves sub-100ms Time to First Chunk (TTFC) through skeleton responses, parallel execution, and intelligent routing. It represents a dramatic improvement over the baseline sequential flow.

Location: app/api/chat/orchestrators/parallel-orchestrator.ts (900 lines)

Performance Goals

Baseline (Sequential Flow)

Request → Context (1.9s) → Search (1.2s) → Rerank (762ms) → AI (slow TTFC) 
Total: 5.85 seconds before user sees anything

Optimized (Parallel Flow)

Request → Skeleton (50ms) → TTFC 
        ↓
        ├→ Context (900ms) → AI starts streaming
        └→ Search prep (100ms) → Full search when context ready
        ↓
        Products injected dynamically into stream at ~1.4s

Results:

TTFC: <100ms (was slow)
First AI Text: ~1s (context complete)
Products Appear: ~1.4s (search complete)
Perceived Improvement: 98% faster initial response

Architecture Overview

📋 Execution Flow (Step-by-Step)

Complete Request Lifecycle

Key Components Deep Dive

1. Query Validation (Step 0)

Purpose: Fast-fail nonsense queries before expensive processing

Two-Stage Validation:

Quick Nonsense Patterns:

Random characters: "asdfghjkl", "12345678"
Keyboard mashing: "qwerty", "zxcvbn"
Empty or whitespace-only
Single character queries

Suspicious Patterns (need LLM validation):

Long uppercase strings: "ASDFGHJKL12345"
Random numbers: "9384756291"
Mixed gibberish: "abc123XYZ"

Code Reference: Lines 74-132

// Quick synchronous check
const quickNonsenseCheck = isObviousNonsense(userMessage);

if (quickNonsenseCheck) {
  await QueryRefinementHandler.handle({
    controller,
    userMessage,
    intent: 'nonsense_query',
    // ... trigger refinement
  });
  return; // Fast-fail
}

// LLM validation for ambiguous cases
if (needsLLMValidation) {
  const validation = await validateQuery(userMessage, 'et');
  
  if (!validation.isSensible && validation.confidence > 0.7) {
    // Nonsense confirmed by LLM
    await QueryRefinementHandler.handle({ /* ... */ });
    return;
  }
}

Benefits:

Fast-fail: No wasted LLM calls for obvious nonsense
Smart validation: LLM validates ambiguous cases
Better UX: Immediate refinement suggestions

2. Context Extraction (Step 2)

Purpose: Extract structured GiftContext from user message

Timeout Safety:

// 35s outer timeout (inner: 25s main + 4s classifier)
contextResult = await Promise.race([
  contextPromise,
  new Promise((_, reject) => 
    setTimeout(() => reject(new Error('Timeout after 35s')), 35000)
  )
]);

Error Handling:

Context extraction failures → Graceful error message
Production: Generic error
Development: Detailed error message

What Gets Extracted: See Context Understanding Documentation

Code Reference: Lines 140-182

3. Signal Detection & Routing (Step 3)

Purpose: Determine if query has enough context to search or needs clarification

Multi-Factor Detection:

Signal Types:

// Product Signals
const hasProductType = !!giftContext.productType;
const hasCategory = !!giftContext.category;
const hasProductTypeHints = (giftContext.productTypeHints?.length ?? 0) > 0;
const hasCategoryHints = (giftContext.categoryHints?.length ?? 0) > 0;

// Gift Context
const hasRecipient = !!giftContext.recipient;
const hasOccasion = !!giftContext.occasion;

// Combined
const hasProductSignals = hasProductType || hasCategory || 
                         hasProductTypeHints || hasCategoryHints;
const hasGiftContext = hasRecipient || hasOccasion;
const hasAnySearchableContext = hasProductSignals || hasGiftContext;

Vague Query Detection:

// Check using context-signals.ts utility
const hasMeaningfulSignals = hasMeaningfulProductSignals(giftContext);
const giftContextMissing = isGiftContextMissing(giftContext);

// Vague if: product search intent + low confidence + no signals + no gift context
let isVagueGiftQuery = isProductSearch && 
                       hasLowConfidence &&
                       !hasMeaningfulSignals &&
                       giftContextMissing;

// CRITICAL OVERRIDE: Explicit type/category overrides vague detection
if (hasExplicitProductType || hasExplicitCategory) {
  isVagueGiftQuery = false; // User explicitly requested something
}

Routing Decision:

const canSearch = (isProductSearch && !isVagueGiftQuery) || 
                 (isUnknownIntent && !isVagueIntent);

if (!canSearch) {
  // Route to HandlerRouter for clarifying questions or conversational
  await HandlerRouter.route({ /* ... */ });
  return; // No skeleton, no search
}

// Send skeleton NOW (only for product search)
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));

Code Reference: Lines 217-419

4. Skeleton Response (Step 3.1)

Purpose: Immediate visual feedback to user

When Sent:

Product search with good signals
NOT sent for vague queries (go to clarifying questions)
NOT sent for conversational intents (greeting, question)
NOT sent for product inquiries (no search needed)

What It Contains:

{
  type: "skeleton",
  count: 1  // Number of loading placeholders
}

Frontend Behavior:

Displays loading skeleton UI
Shows 1-3 product card placeholders
Indicates processing in progress

Timing: Sent immediately after signal detection confirms searchable intent (~50-100ms)

Code Reference: Line 419

5. Search Execution (Step 4)

Purpose: Execute product search with context

Memory Resolution First:

// Check if we can serve from memory (previous results)
let searchResult = resolveFromMemory({
  intent: giftContext.intent,
  userMessage,
  giftContext,
  storedProducts,
  excludeIds,
  ignorePreviouslyShown: isCheapestOnlyQuery,
  debug
});

if (!searchResult) {
  // Memory miss, execute full search
  searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });
}

Budget Calculation for "Cheaper" Queries:

if (giftContext.intent === 'cheaper_alternatives' && 
    !giftContext.budget?.max && 
    storedProducts.length > 0) {
  
  const avgPrice = calculateAverage(storedProducts.map(p => p.price));
  
  // Set implicit budget to 70% of average
  giftContext.budget = {
    max: Math.floor(avgPrice * 0.7)
  };
}

Book Category Clearing for Gift Searches:

// Clear book categories for gift product types
if (giftContext.productType === 'Kingitused' && giftContext.category) {
  const isLikelyBookCategory = /raamat|romantika|kirjandus|novel/i
    .test(giftContext.category);
  
  if (isLikelyBookCategory) {
    giftContext.category = undefined; // Clear it!
  }
}

No Results Handling:

if (searchResult.products.length === 0) {
  // Trigger intelligent refinement flow
  await QueryRefinementHandler.handle({
    controller,
    userMessage,
    intent: giftContext.intent,
    giftContext,
    // ... suggestions for refinement
  });
  return;
}

Code Reference: Lines 440-531

6. Product Injection (Step 5)

Purpose: Send product metadata to frontend for card rendering

Data Structure:

// Display products (top 3)
const displayProducts = searchResult.products.slice(0, 3).map(p => ({
  ...p,
  in_popular_list: p.in_popular_list // Preserve popularity flag
}));

// Safety preface
const safetyPreface = buildSafetyPreface({
  language: giftContext.language || 'et',
  budget: giftContext.budget,
  warnings: searchResult.funnelWarnings, // "No results in budget" etc.
  products: displayProducts,
  meta: giftContext.meta
});

// Smart suggestions
const smartSuggestions = generateSmartSuggestions({
  originalQuery: userMessage,
  detectedIntent: giftContext.intent,
  currentProductType: giftContext.productType,
  currentCategory: giftContext.category,
  returnedProducts: searchResult.allCandidates,
  context: giftContext
});

// Metadata
const metadata = {
  queryForSearch: userMessage,
  csv_category: giftContext.category,
  product_type: giftContext.productType,
  categoryHints: giftContext.categoryHints,
  search: {
    timeMs: searchResult.searchTime,
    metrics: searchResult.metrics,
    diversityMeta: searchResult.diversityMeta
  },
  smartSuggestions,
  contextData: {
    occasion: giftContext.occasion,
    recipient: giftContext.recipient,
    budget: giftContext.budget,
    confidence: giftContext.confidence,
    // ... all context fields
  }
};

// Send to frontend
StreamingUtils.safeEnqueue(
  controller,
  StreamingUtils.createProductMetadataEvent(displayProducts, metadata)
);

Code Reference: Lines 535-680

7. Response Streaming (Step 6)

Purpose: Stream AI-generated response with delayed product cards

Flow:

Delayed Card Injection:

AI starts streaming immediately
Products injected after initial explanation (200-300 words)
AI continues explaining products
Smart suggestions appended at end

Code Reference: Lines 694-712

await ResponseOrchestrator.generateWithDelayedCards({
  controller,
  products: displayProducts,
  userMessage,
  systemPrompt,
  startTime,
  searchStart: searchResult.searchStart,
  searchEnd: searchResult.searchEnd,
  pipelineMetrics: {
    contextExtractionMs: contextTime,
    ...searchResult.pipelineMetrics
  },
  language: giftContext.language || 'et',
  giftContext,
  prefaceText: safetyPreface // Budget warnings, etc.
});

8. Context Persistence (Step 7)

Purpose: Store conversation state for follow-ups and pronoun resolution

What Gets Stored:

await ContextOrchestrator.persistContext({
  conversationId: request.conversationId,
  convexClient,
  giftContext,
  selectedProducts: displayProducts,
  enabled: PHASE5_ENABLED,
  debug
});

Stored Data:

Authors: For pronoun resolution ("tema" → last author)
Taxonomy: Last productType/category for context continuity
Exclusions: Product IDs shown (for "show more")
Budget: Budget preferences across turns

Use Cases:

"näita veel" → Excludes previously shown products
"tema teosed" → Resolves to last mentioned author
"odavamaid" → Uses last shown products for budget calculation

Code Reference: Lines 683-691

🚦 Routing Decision Matrix

Scenario	Signals	Confidence	Intent	Skeleton?	Action
Specific product query	Product type	High	`product_search`	Yes	Search → Stream
Gift with context	⚠️ Fallback	Medium	`birthday_gift`	Yes	Search → Stream
Vague gift query	None	Low	`product_search`	No	Clarifying question
Greeting	-	-	`greeting`	No	Conversational
Product inquiry	-	-	`product_inquiry`	No	Resolve from memory → Conversational
Show more	Memory	-	`show_more_products`	Yes	Search (exclude previous)
Nonsense	-	-	-	No	Query refinement

Special Intent Handling

Product Inquiry

When: User asks about a previously shown product

Example: "Kas see raamat sobib 10-aastasele?"

Flow:

Resolution Logic:

// 1. Check if productId in giftContext
if (giftContext.productInquiry?.productId) {
  return findByProductId(productId, storedProducts);
}

// 2. Check if productName matches
if (giftContext.productInquiry?.productName) {
  // Try exact match
  const exact = storedProducts.find(p => 
    sanitize(p.title) === sanitize(productName)
  );
  if (exact) return exact;
  
  // Try partial match
  const partial = storedProducts.find(p =>
    sanitize(p.title).includes(sanitize(productName))
  );
  if (partial) return partial;
}

// 3. Hydrate from DB if needed (description missing)
if (product && needsHydration(product)) {
  return await convexClient.query(
    api.queries.getProduct.getProductById,
    { productId }
  );
}

Code Reference: Lines 185-214, 749-898

Cheaper Alternatives

When: User wants cheaper options

Example: "midagi odavamat"

Budget Calculation:

// Calculate average price of previously shown products
const avgPrice = storedProducts.reduce((sum, p) => sum + p.price, 0) 
               / storedProducts.length;

// Set implicit budget to 70% of average
giftContext.budget = {
  max: Math.floor(avgPrice * 0.7)
};

Exclusion Override:

// For superlative queries ("cheapest"), ignore previous exclusions
const isCheapestOnlyQuery = /\b(kõige\s+odavam|odavaim|cheapest)\b/i
  .test(userMessage);

if (isCheapestOnlyQuery) {
  excludeIds = []; // Show all results, even previously shown
}

Code Reference: Lines 442-466, 469-478

Configuration & Tuning

Environment Variables

# Enable parallel orchestrator
PARALLEL_EXECUTION_ENABLE=true

# Context extraction timeout (inner: 25s main + 4s classifier)
CONTEXT_EXTRACTION_TIMEOUT_MS=25000

# Enable Phase 5 context persistence
PHASE5_CONTEXT_ENABLE=true

# Enable debug logging
CHAT_DEBUG_LOGS=true

Performance Tuning

Context Extraction Timeout:

// Inner timeouts
CONTEXT_EXTRACTION_TIMEOUT_MS=25000  // Main extractor
FAST_CLASSIFIER_TIMEOUT_MS=4000      // Fast classifier

// Outer safety timeout
PARALLEL_ORCHESTRATOR_TIMEOUT=35000  // 25s + 4s + 6s buffer

Search Timeout:

// SearchOrchestrator has its own timeouts
SEARCH_TIMEOUT_MS=15000

Skeleton Delay:

// How long before sending skeleton (for signal detection)
// Currently: After context extraction and signal analysis (fast)
// Could optimize: Send skeleton earlier, adjust route later (risky)

🐛 Debugging & Observability

Debug Logging

Enable:

export CHAT_DEBUG_LOGS=true

Key Log Points:

Validation (Lines 75-132):

 PARALLEL: Validating query...
⚠️ PARALLEL: OBVIOUS NONSENSE DETECTED
 PARALLEL: LLM VALIDATION RESULT

Context (Lines 175-181):

 CONTEXT COMPLETE (Parallel): { time: 900ms, intent: ... }

Routing (Lines 269-332):

 VAGUE GIFT QUERY DEBUG: { ... all signals ... }
 ROUTING DECISION DEBUG: { canSearch, decision, ... }
 ROUTING DECISION DEBUG: { computed signals, decision }

Search (Lines 493-508):

 SEARCH COMPLETE (Parallel): { time: 400ms, products: 5 }
 PARALLEL: Served products from memory cache

Products (Lines 536-540):

 RAW PRODUCTS FROM SEARCH (before normalization)

Metadata (Lines 654-664):

📤 PARALLEL ORCHESTRATOR: Sending metadata
🏷️ CONTEXT DATA BEING SENT

Complete (Lines 714-722):

 PARALLEL FLOW COMPLETE: { totalTime, improvement }

Performance Metrics

Logged Automatically:

const totalTime = Date.now() - startTime;
const improvement = Math.round((5850 - totalTime) / 5850 * 100);

console.log(' PARALLEL FLOW COMPLETE:', {
  totalTime: totalTime + 'ms',
  contextTime: contextTime + 'ms',
  searchTime: (Date.now() - searchStart) + 'ms',
  improvement: `${improvement}% faster than baseline`
});

Typical Values:

contextTime: 900ms (LLaMA 8B context extraction)
searchTime: 400ms (multi-query search + rerank)
totalTime: 1300-1500ms
TTFC: <100ms (skeleton)
Improvement: 97-98% faster perceived TTFC

Intent Metadata Event

Purpose: Debug intent detection and routing

Sent: After signal detection, before search/conversational

StreamingUtils.safeEnqueue(
  controller,
  StreamingUtils.createIntentMetadataEvent({
    intent: giftContext.intent,
    confidence: giftContext.confidence,
    signals: {
      hasProductType,
      hasCategory,
      hasRecipient,
      hasOccasion
    },
    decision: 'search' | 'conversational' | 'clarifying_questions'
  })
);

Use Cases:

Testing intent detection accuracy
Monitoring routing decisions
Debugging signal detection issues

Code Reference: Lines 335-356

Best Practices

1. Always Send Skeleton for Product Search

//  CORRECT: Send skeleton after confirming product search
if (canSearch) {
  StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
  // ... then execute search
}

//  WRONG: Send skeleton before knowing if search is needed
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
if (!canSearch) {
  // Now user sees loading but gets conversational response - bad UX!
}

2. Override Vague Detection for Explicit Requests

// User explicitly said "show me gifts"
const hasExplicitProductType = giftContext.productType?.trim().length > 0;

// Even if productType is "Kingitused" (fallback), user requested it explicitly
if (isVagueGiftQuery && hasExplicitProductType) {
  isVagueGiftQuery = false; // Override, execute search
}

3. Memory Resolution Before Search

//  CORRECT: Try memory first, then search
let searchResult = resolveFromMemory({ /* ... */ });

if (!searchResult) {
  searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });
}

//  WRONG: Always search (wastes time and resources)
const searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });

4. Graceful Error Handling

try {
  contextResult = await Promise.race([
    contextPromise,
    new Promise((_, reject) => 
      setTimeout(() => reject(new Error('Timeout')), 35000)
    )
  ]);
} catch (error) {
  //  CORRECT: Graceful error message
  const errorText = process.env.NODE_ENV === 'production'
    ? 'Vabandust, tekkis tehniline viga.'
    : `Vabandust: ${error.message}`;
  
  StreamingUtils.safeEnqueue(controller, 
    StreamingUtils.createTextDeltaEvent(errorText));
  controller.close();
  return;
}

Integration Points

Upstream Dependencies

ContextOrchestrator (context-orchestrator/)
- orchestrate() - Extract GiftContext
- persistContext() - Save conversation state
SearchOrchestrator (search-orchestrator.ts)
- orchestrate() - Execute product search
ResponseOrchestrator (response-orchestrator.ts)
- generateWithDelayedCards() - Stream response with products
HandlerRouter (handlers/handler-router.ts)
- route() - Route to appropriate handler (clarifying, conversational, etc.)

Downstream Consumers

route.ts - HTTP entry point
- Calls ParallelOrchestrator.execute()
Frontend - Receives streaming events
- Skeleton event → Show loading UI
- Product metadata → Render product cards
- Text deltas → Stream AI response
- Intent metadata → Debug display

Performance Comparison

Sequential vs Parallel

Metric	Sequential	Parallel	Improvement
TTFC	slow	<100ms	58x faster
Context Extraction	1.9s	900ms	2.1x faster
Search	1.2s (after context)	400ms (parallel)	3x faster
User Sees Something	slow	0.05s	117x faster
Products Appear	slow	1.4s	4.2x faster
Total Time	slow	1.5s	3.9x faster

Key Insight: Parallel mode doesn't just speed up individual steps—it changes the perception of speed by providing immediate feedback.

Skeleton Response Impact

Without Skeleton (Sequential):

[5.8s delay with no feedback] → Products appear
User experience: "Is this working? 😕"

With Skeleton (Parallel):

[0.05s] → Skeleton appears
[0.9s] → AI starts streaming
[1.4s] → Products replace skeleton
User experience: "Fast and responsive! 😊"

Perceived Improvement: 98% faster to first visual feedback

🔧 Troubleshooting

Issue 1: Skeleton Sent for Conversational Intent

Symptom: User sees loading skeleton but gets conversational response

Cause: Skeleton sent before signal detection

Fix:

//  WRONG: Send skeleton immediately
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
const canSearch = detectIfCanSearch(giftContext);

//  CORRECT: Send skeleton after confirming product search
const canSearch = detectIfCanSearch(giftContext);
if (canSearch) {
  StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
}

Issue 2: Vague Queries Triggering Search

Symptom: "I need help" triggers product search instead of conversational

Cause: Explicit override not working

Debug:

console.log(' VAGUE GIFT QUERY DEBUG:', {
  hasMeaningfulSignals,
  giftContextMissing,
  hasExplicitProductType,
  isVagueGiftQuery
});

Fix: Check signal detection logic in context-signals.ts

Issue 3: Context Extraction Timeout

Symptom: Requests failing with "Context orchestration timeout after 35s"

Cause: LLM taking too long or hanging

Debug:

export CHAT_DEBUG_LOGS=true
# Check context extraction logs

Fix:

Increase timeout: CONTEXT_EXTRACTION_TIMEOUT_MS=30000
Check LLM provider status (Groq)
Verify network connectivity

Issue 4: Products Not Injecting

Symptom: Skeleton shows but products never appear

Cause: Search returning 0 results or metadata event not sent

Debug:

if (searchResult.products.length === 0) {
  console.log(' NO PRODUCTS FOUND');
}

console.log('📤 PARALLEL ORCHESTRATOR: Sending metadata:', metadata);

Fix: Check search logs, verify product pool

Core Systems

High-Level Architecture - System overview
Context Understanding - Context extraction
Context Signals - Signal detection

Handlers

Handler Router - Handler selection
Clarifying Question Handler - Vague query handling
Query Refinement Handler - No results handling

Search & Response

Search Orchestrator - Product search
Response Orchestrator - Response streaming

Future Optimizations

Potential Improvements

Even Faster TTFC
- Send skeleton before context extraction (risky)
- Adjust route dynamically if intent changes
- Target: <50ms TTFC
Smarter Memory Resolution
- Cache more query patterns
- Fuzzy matching for similar queries
- Target: 50% of queries served from memory
Progressive Product Loading
- Inject products one-by-one as they're found
- Don't wait for full search completion
- Target: First product at <1s
Predictive Search
- Start search speculatively during context extraction
- Cancel if routing goes conversational
- Target: Products ready by context completion

Last Updated: 2025-01-17
Version: 2.0
Status: Production Ready

Overview​

Performance Goals​

Baseline (Sequential Flow)​

Optimized (Parallel Flow)​

Architecture Overview​

📋 Execution Flow (Step-by-Step)​

Complete Request Lifecycle​

Key Components Deep Dive​

1. Query Validation (Step 0)​

2. Context Extraction (Step 2)​

3. Signal Detection & Routing (Step 3)​

4. Skeleton Response (Step 3.1)​

5. Search Execution (Step 4)​

6. Product Injection (Step 5)​

7. Response Streaming (Step 6)​

8. Context Persistence (Step 7)​

🚦 Routing Decision Matrix​

Special Intent Handling​

Product Inquiry​

Cheaper Alternatives​

Configuration & Tuning​

Environment Variables​

Performance Tuning​

🐛 Debugging & Observability​

Debug Logging​

Performance Metrics​

Intent Metadata Event​

Best Practices​

1. Always Send Skeleton for Product Search​

2. Override Vague Detection for Explicit Requests​

3. Memory Resolution Before Search​

4. Graceful Error Handling​

Integration Points​

Upstream Dependencies​

Downstream Consumers​

Performance Comparison​

Sequential vs Parallel​

Skeleton Response Impact​

🔧 Troubleshooting​

Issue 1: Skeleton Sent for Conversational Intent​

Issue 2: Vague Queries Triggering Search​

Issue 3: Context Extraction Timeout​

Issue 4: Products Not Injecting​

Related Documentation​

Core Systems​

Handlers​

Search & Response​

Future Optimizations​

Potential Improvements​

Overview

Performance Goals

Baseline (Sequential Flow)

Optimized (Parallel Flow)

Architecture Overview

📋 Execution Flow (Step-by-Step)

Complete Request Lifecycle

Key Components Deep Dive

1. Query Validation (Step 0)

2. Context Extraction (Step 2)

3. Signal Detection & Routing (Step 3)

4. Skeleton Response (Step 3.1)

5. Search Execution (Step 4)

6. Product Injection (Step 5)

7. Response Streaming (Step 6)

8. Context Persistence (Step 7)

🚦 Routing Decision Matrix

Special Intent Handling

Product Inquiry

Cheaper Alternatives

Configuration & Tuning

Environment Variables

Performance Tuning

🐛 Debugging & Observability

Debug Logging

Performance Metrics

Intent Metadata Event

Best Practices

1. Always Send Skeleton for Product Search

2. Override Vague Detection for Explicit Requests

3. Memory Resolution Before Search

4. Graceful Error Handling

Integration Points

Upstream Dependencies

Downstream Consumers

Performance Comparison

Sequential vs Parallel

Skeleton Response Impact

🔧 Troubleshooting

Issue 1: Skeleton Sent for Conversational Intent

Issue 2: Vague Queries Triggering Search

Issue 3: Context Extraction Timeout

Issue 4: Products Not Injecting

Related Documentation

Core Systems

Handlers

Search & Response

Future Optimizations

Potential Improvements