Skip to main content

Parallel Orchestrator

Overview

The Parallel Orchestrator is the performance-optimized request handler that achieves sub-100ms Time to First Chunk (TTFC) through skeleton responses, parallel execution, and intelligent routing. It represents a dramatic improvement over the baseline sequential flow.

Location: app/api/chat/orchestrators/parallel-orchestrator.ts (900 lines)


Performance Goals

Baseline (Sequential Flow)

Request → Context (1.9s) → Search (1.2s) → Rerank (762ms) → AI (slow TTFC) 
Total: 5.85 seconds before user sees anything

Optimized (Parallel Flow)

Request → Skeleton (50ms) → TTFC 

├→ Context (900ms) → AI starts streaming
└→ Search prep (100ms) → Full search when context ready

Products injected dynamically into stream at ~1.4s

Results:

  • TTFC: <100ms (was slow)
  • First AI Text: ~1s (context complete)
  • Products Appear: ~1.4s (search complete)
  • Perceived Improvement: 98% faster initial response

Architecture Overview


📋 Execution Flow (Step-by-Step)

Complete Request Lifecycle


Key Components Deep Dive

1. Query Validation (Step 0)

Purpose: Fast-fail nonsense queries before expensive processing

Two-Stage Validation:

Quick Nonsense Patterns:

  • Random characters: "asdfghjkl", "12345678"
  • Keyboard mashing: "qwerty", "zxcvbn"
  • Empty or whitespace-only
  • Single character queries

Suspicious Patterns (need LLM validation):

  • Long uppercase strings: "ASDFGHJKL12345"
  • Random numbers: "9384756291"
  • Mixed gibberish: "abc123XYZ"

Code Reference: Lines 74-132

// Quick synchronous check
const quickNonsenseCheck = isObviousNonsense(userMessage);

if (quickNonsenseCheck) {
await QueryRefinementHandler.handle({
controller,
userMessage,
intent: 'nonsense_query',
// ... trigger refinement
});
return; // Fast-fail
}

// LLM validation for ambiguous cases
if (needsLLMValidation) {
const validation = await validateQuery(userMessage, 'et');

if (!validation.isSensible && validation.confidence > 0.7) {
// Nonsense confirmed by LLM
await QueryRefinementHandler.handle({ /* ... */ });
return;
}
}

Benefits:

  • Fast-fail: No wasted LLM calls for obvious nonsense
  • Smart validation: LLM validates ambiguous cases
  • Better UX: Immediate refinement suggestions

2. Context Extraction (Step 2)

Purpose: Extract structured GiftContext from user message

Timeout Safety:

// 35s outer timeout (inner: 25s main + 4s classifier)
contextResult = await Promise.race([
contextPromise,
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout after 35s')), 35000)
)
]);

Error Handling:

  • Context extraction failures → Graceful error message
  • Production: Generic error
  • Development: Detailed error message

What Gets Extracted: See Context Understanding Documentation

Code Reference: Lines 140-182


3. Signal Detection & Routing (Step 3)

Purpose: Determine if query has enough context to search or needs clarification

Multi-Factor Detection:

Signal Types:

// Product Signals
const hasProductType = !!giftContext.productType;
const hasCategory = !!giftContext.category;
const hasProductTypeHints = (giftContext.productTypeHints?.length ?? 0) > 0;
const hasCategoryHints = (giftContext.categoryHints?.length ?? 0) > 0;

// Gift Context
const hasRecipient = !!giftContext.recipient;
const hasOccasion = !!giftContext.occasion;

// Combined
const hasProductSignals = hasProductType || hasCategory ||
hasProductTypeHints || hasCategoryHints;
const hasGiftContext = hasRecipient || hasOccasion;
const hasAnySearchableContext = hasProductSignals || hasGiftContext;

Vague Query Detection:

// Check using context-signals.ts utility
const hasMeaningfulSignals = hasMeaningfulProductSignals(giftContext);
const giftContextMissing = isGiftContextMissing(giftContext);

// Vague if: product search intent + low confidence + no signals + no gift context
let isVagueGiftQuery = isProductSearch &&
hasLowConfidence &&
!hasMeaningfulSignals &&
giftContextMissing;

// CRITICAL OVERRIDE: Explicit type/category overrides vague detection
if (hasExplicitProductType || hasExplicitCategory) {
isVagueGiftQuery = false; // User explicitly requested something
}

Routing Decision:

const canSearch = (isProductSearch && !isVagueGiftQuery) || 
(isUnknownIntent && !isVagueIntent);

if (!canSearch) {
// Route to HandlerRouter for clarifying questions or conversational
await HandlerRouter.route({ /* ... */ });
return; // No skeleton, no search
}

// Send skeleton NOW (only for product search)
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));

Code Reference: Lines 217-419


4. Skeleton Response (Step 3.1)

Purpose: Immediate visual feedback to user

When Sent:

  • Product search with good signals
  • NOT sent for vague queries (go to clarifying questions)
  • NOT sent for conversational intents (greeting, question)
  • NOT sent for product inquiries (no search needed)

What It Contains:

{
type: "skeleton",
count: 1 // Number of loading placeholders
}

Frontend Behavior:

  • Displays loading skeleton UI
  • Shows 1-3 product card placeholders
  • Indicates processing in progress

Timing: Sent immediately after signal detection confirms searchable intent (~50-100ms)

Code Reference: Line 419


5. Search Execution (Step 4)

Purpose: Execute product search with context

Memory Resolution First:

// Check if we can serve from memory (previous results)
let searchResult = resolveFromMemory({
intent: giftContext.intent,
userMessage,
giftContext,
storedProducts,
excludeIds,
ignorePreviouslyShown: isCheapestOnlyQuery,
debug
});

if (!searchResult) {
// Memory miss, execute full search
searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });
}

Budget Calculation for "Cheaper" Queries:

if (giftContext.intent === 'cheaper_alternatives' && 
!giftContext.budget?.max &&
storedProducts.length > 0) {

const avgPrice = calculateAverage(storedProducts.map(p => p.price));

// Set implicit budget to 70% of average
giftContext.budget = {
max: Math.floor(avgPrice * 0.7)
};
}

Book Category Clearing for Gift Searches:

// Clear book categories for gift product types
if (giftContext.productType === 'Kingitused' && giftContext.category) {
const isLikelyBookCategory = /raamat|romantika|kirjandus|novel/i
.test(giftContext.category);

if (isLikelyBookCategory) {
giftContext.category = undefined; // Clear it!
}
}

No Results Handling:

if (searchResult.products.length === 0) {
// Trigger intelligent refinement flow
await QueryRefinementHandler.handle({
controller,
userMessage,
intent: giftContext.intent,
giftContext,
// ... suggestions for refinement
});
return;
}

Code Reference: Lines 440-531


6. Product Injection (Step 5)

Purpose: Send product metadata to frontend for card rendering

Data Structure:

// Display products (top 3)
const displayProducts = searchResult.products.slice(0, 3).map(p => ({
...p,
in_popular_list: p.in_popular_list // Preserve popularity flag
}));

// Safety preface
const safetyPreface = buildSafetyPreface({
language: giftContext.language || 'et',
budget: giftContext.budget,
warnings: searchResult.funnelWarnings, // "No results in budget" etc.
products: displayProducts,
meta: giftContext.meta
});

// Smart suggestions
const smartSuggestions = generateSmartSuggestions({
originalQuery: userMessage,
detectedIntent: giftContext.intent,
currentProductType: giftContext.productType,
currentCategory: giftContext.category,
returnedProducts: searchResult.allCandidates,
context: giftContext
});

// Metadata
const metadata = {
queryForSearch: userMessage,
csv_category: giftContext.category,
product_type: giftContext.productType,
categoryHints: giftContext.categoryHints,
search: {
timeMs: searchResult.searchTime,
metrics: searchResult.metrics,
diversityMeta: searchResult.diversityMeta
},
smartSuggestions,
contextData: {
occasion: giftContext.occasion,
recipient: giftContext.recipient,
budget: giftContext.budget,
confidence: giftContext.confidence,
// ... all context fields
}
};

// Send to frontend
StreamingUtils.safeEnqueue(
controller,
StreamingUtils.createProductMetadataEvent(displayProducts, metadata)
);

Code Reference: Lines 535-680


7. Response Streaming (Step 6)

Purpose: Stream AI-generated response with delayed product cards

Flow:

Delayed Card Injection:

  • AI starts streaming immediately
  • Products injected after initial explanation (200-300 words)
  • AI continues explaining products
  • Smart suggestions appended at end

Code Reference: Lines 694-712

await ResponseOrchestrator.generateWithDelayedCards({
controller,
products: displayProducts,
userMessage,
systemPrompt,
startTime,
searchStart: searchResult.searchStart,
searchEnd: searchResult.searchEnd,
pipelineMetrics: {
contextExtractionMs: contextTime,
...searchResult.pipelineMetrics
},
language: giftContext.language || 'et',
giftContext,
prefaceText: safetyPreface // Budget warnings, etc.
});

8. Context Persistence (Step 7)

Purpose: Store conversation state for follow-ups and pronoun resolution

What Gets Stored:

await ContextOrchestrator.persistContext({
conversationId: request.conversationId,
convexClient,
giftContext,
selectedProducts: displayProducts,
enabled: PHASE5_ENABLED,
debug
});

Stored Data:

  • Authors: For pronoun resolution ("tema" → last author)
  • Taxonomy: Last productType/category for context continuity
  • Exclusions: Product IDs shown (for "show more")
  • Budget: Budget preferences across turns

Use Cases:

  • "näita veel" → Excludes previously shown products
  • "tema teosed" → Resolves to last mentioned author
  • "odavamaid" → Uses last shown products for budget calculation

Code Reference: Lines 683-691


🚦 Routing Decision Matrix

ScenarioSignalsConfidenceIntentSkeleton?Action
Specific product queryProduct typeHighproduct_searchYesSearch → Stream
Gift with context⚠️ FallbackMediumbirthday_giftYesSearch → Stream
Vague gift queryNoneLowproduct_searchNoClarifying question
Greeting--greetingNoConversational
Product inquiry--product_inquiryNoResolve from memory → Conversational
Show moreMemory-show_more_productsYesSearch (exclude previous)
Nonsense---NoQuery refinement

Special Intent Handling

Product Inquiry

When: User asks about a previously shown product

Example: "Kas see raamat sobib 10-aastasele?"

Flow:

Resolution Logic:

// 1. Check if productId in giftContext
if (giftContext.productInquiry?.productId) {
return findByProductId(productId, storedProducts);
}

// 2. Check if productName matches
if (giftContext.productInquiry?.productName) {
// Try exact match
const exact = storedProducts.find(p =>
sanitize(p.title) === sanitize(productName)
);
if (exact) return exact;

// Try partial match
const partial = storedProducts.find(p =>
sanitize(p.title).includes(sanitize(productName))
);
if (partial) return partial;
}

// 3. Hydrate from DB if needed (description missing)
if (product && needsHydration(product)) {
return await convexClient.query(
api.queries.getProduct.getProductById,
{ productId }
);
}

Code Reference: Lines 185-214, 749-898


Cheaper Alternatives

When: User wants cheaper options

Example: "midagi odavamat"

Budget Calculation:

// Calculate average price of previously shown products
const avgPrice = storedProducts.reduce((sum, p) => sum + p.price, 0)
/ storedProducts.length;

// Set implicit budget to 70% of average
giftContext.budget = {
max: Math.floor(avgPrice * 0.7)
};

Exclusion Override:

// For superlative queries ("cheapest"), ignore previous exclusions
const isCheapestOnlyQuery = /\b(kõige\s+odavam|odavaim|cheapest)\b/i
.test(userMessage);

if (isCheapestOnlyQuery) {
excludeIds = []; // Show all results, even previously shown
}

Code Reference: Lines 442-466, 469-478


Configuration & Tuning

Environment Variables

# Enable parallel orchestrator
PARALLEL_EXECUTION_ENABLE=true

# Context extraction timeout (inner: 25s main + 4s classifier)
CONTEXT_EXTRACTION_TIMEOUT_MS=25000

# Enable Phase 5 context persistence
PHASE5_CONTEXT_ENABLE=true

# Enable debug logging
CHAT_DEBUG_LOGS=true

Performance Tuning

Context Extraction Timeout:

// Inner timeouts
CONTEXT_EXTRACTION_TIMEOUT_MS=25000 // Main extractor
FAST_CLASSIFIER_TIMEOUT_MS=4000 // Fast classifier

// Outer safety timeout
PARALLEL_ORCHESTRATOR_TIMEOUT=35000 // 25s + 4s + 6s buffer

Search Timeout:

// SearchOrchestrator has its own timeouts
SEARCH_TIMEOUT_MS=15000

Skeleton Delay:

// How long before sending skeleton (for signal detection)
// Currently: After context extraction and signal analysis (fast)
// Could optimize: Send skeleton earlier, adjust route later (risky)

🐛 Debugging & Observability

Debug Logging

Enable:

export CHAT_DEBUG_LOGS=true

Key Log Points:

  1. Validation (Lines 75-132):
 PARALLEL: Validating query...
⚠️ PARALLEL: OBVIOUS NONSENSE DETECTED
PARALLEL: LLM VALIDATION RESULT
  1. Context (Lines 175-181):
 CONTEXT COMPLETE (Parallel): { time: 900ms, intent: ... }
  1. Routing (Lines 269-332):
 VAGUE GIFT QUERY DEBUG: { ... all signals ... }
ROUTING DECISION DEBUG: { canSearch, decision, ... }
ROUTING DECISION DEBUG: { computed signals, decision }
  1. Search (Lines 493-508):
 SEARCH COMPLETE (Parallel): { time: 400ms, products: 5 }
PARALLEL: Served products from memory cache
  1. Products (Lines 536-540):
 RAW PRODUCTS FROM SEARCH (before normalization)
  1. Metadata (Lines 654-664):
📤 PARALLEL ORCHESTRATOR: Sending metadata
🏷️ CONTEXT DATA BEING SENT
  1. Complete (Lines 714-722):
 PARALLEL FLOW COMPLETE: { totalTime, improvement }

Performance Metrics

Logged Automatically:

const totalTime = Date.now() - startTime;
const improvement = Math.round((5850 - totalTime) / 5850 * 100);

console.log(' PARALLEL FLOW COMPLETE:', {
totalTime: totalTime + 'ms',
contextTime: contextTime + 'ms',
searchTime: (Date.now() - searchStart) + 'ms',
improvement: `${improvement}% faster than baseline`
});

Typical Values:

  • contextTime: 900ms (LLaMA 8B context extraction)
  • searchTime: 400ms (multi-query search + rerank)
  • totalTime: 1300-1500ms
  • TTFC: <100ms (skeleton)
  • Improvement: 97-98% faster perceived TTFC

Intent Metadata Event

Purpose: Debug intent detection and routing

Sent: After signal detection, before search/conversational

StreamingUtils.safeEnqueue(
controller,
StreamingUtils.createIntentMetadataEvent({
intent: giftContext.intent,
confidence: giftContext.confidence,
signals: {
hasProductType,
hasCategory,
hasRecipient,
hasOccasion
},
decision: 'search' | 'conversational' | 'clarifying_questions'
})
);

Use Cases:

  • Testing intent detection accuracy
  • Monitoring routing decisions
  • Debugging signal detection issues

Code Reference: Lines 335-356


Best Practices

//  CORRECT: Send skeleton after confirming product search
if (canSearch) {
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
// ... then execute search
}

// WRONG: Send skeleton before knowing if search is needed
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
if (!canSearch) {
// Now user sees loading but gets conversational response - bad UX!
}

2. Override Vague Detection for Explicit Requests

// User explicitly said "show me gifts"
const hasExplicitProductType = giftContext.productType?.trim().length > 0;

// Even if productType is "Kingitused" (fallback), user requested it explicitly
if (isVagueGiftQuery && hasExplicitProductType) {
isVagueGiftQuery = false; // Override, execute search
}

//  CORRECT: Try memory first, then search
let searchResult = resolveFromMemory({ /* ... */ });

if (!searchResult) {
searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });
}

// WRONG: Always search (wastes time and resources)
const searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });

4. Graceful Error Handling

try {
contextResult = await Promise.race([
contextPromise,
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), 35000)
)
]);
} catch (error) {
// CORRECT: Graceful error message
const errorText = process.env.NODE_ENV === 'production'
? 'Vabandust, tekkis tehniline viga.'
: `Vabandust: ${error.message}`;

StreamingUtils.safeEnqueue(controller,
StreamingUtils.createTextDeltaEvent(errorText));
controller.close();
return;
}

Integration Points

Upstream Dependencies

  1. ContextOrchestrator (context-orchestrator/)

    • orchestrate() - Extract GiftContext
    • persistContext() - Save conversation state
  2. SearchOrchestrator (search-orchestrator.ts)

    • orchestrate() - Execute product search
  3. ResponseOrchestrator (response-orchestrator.ts)

    • generateWithDelayedCards() - Stream response with products
  4. HandlerRouter (handlers/handler-router.ts)

    • route() - Route to appropriate handler (clarifying, conversational, etc.)

Downstream Consumers

  1. route.ts - HTTP entry point

    • Calls ParallelOrchestrator.execute()
  2. Frontend - Receives streaming events

    • Skeleton event → Show loading UI
    • Product metadata → Render product cards
    • Text deltas → Stream AI response
    • Intent metadata → Debug display

Performance Comparison

Sequential vs Parallel

MetricSequentialParallelImprovement
TTFCslow<100ms58x faster
Context Extraction1.9s900ms2.1x faster
Search1.2s (after context)400ms (parallel)3x faster
User Sees Somethingslow0.05s117x faster
Products Appearslow1.4s4.2x faster
Total Timeslow1.5s3.9x faster

Key Insight: Parallel mode doesn't just speed up individual steps—it changes the perception of speed by providing immediate feedback.


Skeleton Response Impact

Without Skeleton (Sequential):

[5.8s delay with no feedback] → Products appear
User experience: "Is this working? 😕"

With Skeleton (Parallel):

[0.05s] → Skeleton appears
[0.9s] → AI starts streaming
[1.4s] → Products replace skeleton
User experience: "Fast and responsive! 😊"

Perceived Improvement: 98% faster to first visual feedback


🔧 Troubleshooting

Issue 1: Skeleton Sent for Conversational Intent

Symptom: User sees loading skeleton but gets conversational response

Cause: Skeleton sent before signal detection

Fix:

//  WRONG: Send skeleton immediately
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
const canSearch = detectIfCanSearch(giftContext);

// CORRECT: Send skeleton after confirming product search
const canSearch = detectIfCanSearch(giftContext);
if (canSearch) {
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
}

Symptom: "I need help" triggers product search instead of conversational

Cause: Explicit override not working

Debug:

console.log(' VAGUE GIFT QUERY DEBUG:', {
hasMeaningfulSignals,
giftContextMissing,
hasExplicitProductType,
isVagueGiftQuery
});

Fix: Check signal detection logic in context-signals.ts


Issue 3: Context Extraction Timeout

Symptom: Requests failing with "Context orchestration timeout after 35s"

Cause: LLM taking too long or hanging

Debug:

export CHAT_DEBUG_LOGS=true
# Check context extraction logs

Fix:

  • Increase timeout: CONTEXT_EXTRACTION_TIMEOUT_MS=30000
  • Check LLM provider status (Groq)
  • Verify network connectivity

Issue 4: Products Not Injecting

Symptom: Skeleton shows but products never appear

Cause: Search returning 0 results or metadata event not sent

Debug:

if (searchResult.products.length === 0) {
console.log(' NO PRODUCTS FOUND');
}

console.log('📤 PARALLEL ORCHESTRATOR: Sending metadata:', metadata);

Fix: Check search logs, verify product pool


Core Systems

Handlers

Search & Response


Future Optimizations

Potential Improvements

  1. Even Faster TTFC

    • Send skeleton before context extraction (risky)
    • Adjust route dynamically if intent changes
    • Target: <50ms TTFC
  2. Smarter Memory Resolution

    • Cache more query patterns
    • Fuzzy matching for similar queries
    • Target: 50% of queries served from memory
  3. Progressive Product Loading

    • Inject products one-by-one as they're found
    • Don't wait for full search completion
    • Target: First product at <1s
  4. Predictive Search

    • Start search speculatively during context extraction
    • Cancel if routing goes conversational
    • Target: Products ready by context completion

Last Updated: 2025-01-17
Version: 2.0
Status: Production Ready