Parallel Orchestrator
Overview
The Parallel Orchestrator is the performance-optimized request handler that achieves sub-100ms Time to First Chunk (TTFC) through skeleton responses, parallel execution, and intelligent routing. It represents a dramatic improvement over the baseline sequential flow.
Location: app/api/chat/orchestrators/parallel-orchestrator.ts (900 lines)
Performance Goals
Baseline (Sequential Flow)
Request → Context (1.9s) → Search (1.2s) → Rerank (762ms) → AI (slow TTFC)
Total: 5.85 seconds before user sees anything
Optimized (Parallel Flow)
Request → Skeleton (50ms) → TTFC
↓
├→ Context (900ms) → AI starts streaming
└→ Search prep (100ms) → Full search when context ready
↓
Products injected dynamically into stream at ~1.4s
Results:
- TTFC: <100ms (was slow)
- First AI Text: ~1s (context complete)
- Products Appear: ~1.4s (search complete)
- Perceived Improvement: 98% faster initial response
Architecture Overview
📋 Execution Flow (Step-by-Step)
Complete Request Lifecycle
Key Components Deep Dive
1. Query Validation (Step 0)
Purpose: Fast-fail nonsense queries before expensive processing
Two-Stage Validation:
Quick Nonsense Patterns:
- Random characters:
"asdfghjkl","12345678" - Keyboard mashing:
"qwerty","zxcvbn" - Empty or whitespace-only
- Single character queries
Suspicious Patterns (need LLM validation):
- Long uppercase strings:
"ASDFGHJKL12345" - Random numbers:
"9384756291" - Mixed gibberish:
"abc123XYZ"
Code Reference: Lines 74-132
// Quick synchronous check
const quickNonsenseCheck = isObviousNonsense(userMessage);
if (quickNonsenseCheck) {
await QueryRefinementHandler.handle({
controller,
userMessage,
intent: 'nonsense_query',
// ... trigger refinement
});
return; // Fast-fail
}
// LLM validation for ambiguous cases
if (needsLLMValidation) {
const validation = await validateQuery(userMessage, 'et');
if (!validation.isSensible && validation.confidence > 0.7) {
// Nonsense confirmed by LLM
await QueryRefinementHandler.handle({ /* ... */ });
return;
}
}
Benefits:
- Fast-fail: No wasted LLM calls for obvious nonsense
- Smart validation: LLM validates ambiguous cases
- Better UX: Immediate refinement suggestions
2. Context Extraction (Step 2)
Purpose: Extract structured GiftContext from user message
Timeout Safety:
// 35s outer timeout (inner: 25s main + 4s classifier)
contextResult = await Promise.race([
contextPromise,
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout after 35s')), 35000)
)
]);
Error Handling:
- Context extraction failures → Graceful error message
- Production: Generic error
- Development: Detailed error message
What Gets Extracted: See Context Understanding Documentation
Code Reference: Lines 140-182
3. Signal Detection & Routing (Step 3)
Purpose: Determine if query has enough context to search or needs clarification
Multi-Factor Detection:
Signal Types:
// Product Signals
const hasProductType = !!giftContext.productType;
const hasCategory = !!giftContext.category;
const hasProductTypeHints = (giftContext.productTypeHints?.length ?? 0) > 0;
const hasCategoryHints = (giftContext.categoryHints?.length ?? 0) > 0;
// Gift Context
const hasRecipient = !!giftContext.recipient;
const hasOccasion = !!giftContext.occasion;
// Combined
const hasProductSignals = hasProductType || hasCategory ||
hasProductTypeHints || hasCategoryHints;
const hasGiftContext = hasRecipient || hasOccasion;
const hasAnySearchableContext = hasProductSignals || hasGiftContext;
Vague Query Detection:
// Check using context-signals.ts utility
const hasMeaningfulSignals = hasMeaningfulProductSignals(giftContext);
const giftContextMissing = isGiftContextMissing(giftContext);
// Vague if: product search intent + low confidence + no signals + no gift context
let isVagueGiftQuery = isProductSearch &&
hasLowConfidence &&
!hasMeaningfulSignals &&
giftContextMissing;
// CRITICAL OVERRIDE: Explicit type/category overrides vague detection
if (hasExplicitProductType || hasExplicitCategory) {
isVagueGiftQuery = false; // User explicitly requested something
}
Routing Decision:
const canSearch = (isProductSearch && !isVagueGiftQuery) ||
(isUnknownIntent && !isVagueIntent);
if (!canSearch) {
// Route to HandlerRouter for clarifying questions or conversational
await HandlerRouter.route({ /* ... */ });
return; // No skeleton, no search
}
// Send skeleton NOW (only for product search)
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
Code Reference: Lines 217-419
4. Skeleton Response (Step 3.1)
Purpose: Immediate visual feedback to user
When Sent:
- Product search with good signals
- NOT sent for vague queries (go to clarifying questions)
- NOT sent for conversational intents (greeting, question)
- NOT sent for product inquiries (no search needed)
What It Contains:
{
type: "skeleton",
count: 1 // Number of loading placeholders
}
Frontend Behavior:
- Displays loading skeleton UI
- Shows 1-3 product card placeholders
- Indicates processing in progress
Timing: Sent immediately after signal detection confirms searchable intent (~50-100ms)
Code Reference: Line 419
5. Search Execution (Step 4)
Purpose: Execute product search with context
Memory Resolution First:
// Check if we can serve from memory (previous results)
let searchResult = resolveFromMemory({
intent: giftContext.intent,
userMessage,
giftContext,
storedProducts,
excludeIds,
ignorePreviouslyShown: isCheapestOnlyQuery,
debug
});
if (!searchResult) {
// Memory miss, execute full search
searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });
}
Budget Calculation for "Cheaper" Queries:
if (giftContext.intent === 'cheaper_alternatives' &&
!giftContext.budget?.max &&
storedProducts.length > 0) {
const avgPrice = calculateAverage(storedProducts.map(p => p.price));
// Set implicit budget to 70% of average
giftContext.budget = {
max: Math.floor(avgPrice * 0.7)
};
}
Book Category Clearing for Gift Searches:
// Clear book categories for gift product types
if (giftContext.productType === 'Kingitused' && giftContext.category) {
const isLikelyBookCategory = /raamat|romantika|kirjandus|novel/i
.test(giftContext.category);
if (isLikelyBookCategory) {
giftContext.category = undefined; // Clear it!
}
}
No Results Handling:
if (searchResult.products.length === 0) {
// Trigger intelligent refinement flow
await QueryRefinementHandler.handle({
controller,
userMessage,
intent: giftContext.intent,
giftContext,
// ... suggestions for refinement
});
return;
}
Code Reference: Lines 440-531
6. Product Injection (Step 5)
Purpose: Send product metadata to frontend for card rendering
Data Structure:
// Display products (top 3)
const displayProducts = searchResult.products.slice(0, 3).map(p => ({
...p,
in_popular_list: p.in_popular_list // Preserve popularity flag
}));
// Safety preface
const safetyPreface = buildSafetyPreface({
language: giftContext.language || 'et',
budget: giftContext.budget,
warnings: searchResult.funnelWarnings, // "No results in budget" etc.
products: displayProducts,
meta: giftContext.meta
});
// Smart suggestions
const smartSuggestions = generateSmartSuggestions({
originalQuery: userMessage,
detectedIntent: giftContext.intent,
currentProductType: giftContext.productType,
currentCategory: giftContext.category,
returnedProducts: searchResult.allCandidates,
context: giftContext
});
// Metadata
const metadata = {
queryForSearch: userMessage,
csv_category: giftContext.category,
product_type: giftContext.productType,
categoryHints: giftContext.categoryHints,
search: {
timeMs: searchResult.searchTime,
metrics: searchResult.metrics,
diversityMeta: searchResult.diversityMeta
},
smartSuggestions,
contextData: {
occasion: giftContext.occasion,
recipient: giftContext.recipient,
budget: giftContext.budget,
confidence: giftContext.confidence,
// ... all context fields
}
};
// Send to frontend
StreamingUtils.safeEnqueue(
controller,
StreamingUtils.createProductMetadataEvent(displayProducts, metadata)
);
Code Reference: Lines 535-680
7. Response Streaming (Step 6)
Purpose: Stream AI-generated response with delayed product cards
Flow:
Delayed Card Injection:
- AI starts streaming immediately
- Products injected after initial explanation (200-300 words)
- AI continues explaining products
- Smart suggestions appended at end
Code Reference: Lines 694-712
await ResponseOrchestrator.generateWithDelayedCards({
controller,
products: displayProducts,
userMessage,
systemPrompt,
startTime,
searchStart: searchResult.searchStart,
searchEnd: searchResult.searchEnd,
pipelineMetrics: {
contextExtractionMs: contextTime,
...searchResult.pipelineMetrics
},
language: giftContext.language || 'et',
giftContext,
prefaceText: safetyPreface // Budget warnings, etc.
});
8. Context Persistence (Step 7)
Purpose: Store conversation state for follow-ups and pronoun resolution
What Gets Stored:
await ContextOrchestrator.persistContext({
conversationId: request.conversationId,
convexClient,
giftContext,
selectedProducts: displayProducts,
enabled: PHASE5_ENABLED,
debug
});
Stored Data:
- Authors: For pronoun resolution ("tema" → last author)
- Taxonomy: Last productType/category for context continuity
- Exclusions: Product IDs shown (for "show more")
- Budget: Budget preferences across turns
Use Cases:
- "näita veel" → Excludes previously shown products
- "tema teosed" → Resolves to last mentioned author
- "odavamaid" → Uses last shown products for budget calculation
Code Reference: Lines 683-691
🚦 Routing Decision Matrix
| Scenario | Signals | Confidence | Intent | Skeleton? | Action |
|---|---|---|---|---|---|
| Specific product query | Product type | High | product_search | Yes | Search → Stream |
| Gift with context | ⚠️ Fallback | Medium | birthday_gift | Yes | Search → Stream |
| Vague gift query | None | Low | product_search | No | Clarifying question |
| Greeting | - | - | greeting | No | Conversational |
| Product inquiry | - | - | product_inquiry | No | Resolve from memory → Conversational |
| Show more | Memory | - | show_more_products | Yes | Search (exclude previous) |
| Nonsense | - | - | - | No | Query refinement |
Special Intent Handling
Product Inquiry
When: User asks about a previously shown product
Example: "Kas see raamat sobib 10-aastasele?"
Flow:
Resolution Logic:
// 1. Check if productId in giftContext
if (giftContext.productInquiry?.productId) {
return findByProductId(productId, storedProducts);
}
// 2. Check if productName matches
if (giftContext.productInquiry?.productName) {
// Try exact match
const exact = storedProducts.find(p =>
sanitize(p.title) === sanitize(productName)
);
if (exact) return exact;
// Try partial match
const partial = storedProducts.find(p =>
sanitize(p.title).includes(sanitize(productName))
);
if (partial) return partial;
}
// 3. Hydrate from DB if needed (description missing)
if (product && needsHydration(product)) {
return await convexClient.query(
api.queries.getProduct.getProductById,
{ productId }
);
}
Code Reference: Lines 185-214, 749-898
Cheaper Alternatives
When: User wants cheaper options
Example: "midagi odavamat"
Budget Calculation:
// Calculate average price of previously shown products
const avgPrice = storedProducts.reduce((sum, p) => sum + p.price, 0)
/ storedProducts.length;
// Set implicit budget to 70% of average
giftContext.budget = {
max: Math.floor(avgPrice * 0.7)
};
Exclusion Override:
// For superlative queries ("cheapest"), ignore previous exclusions
const isCheapestOnlyQuery = /\b(kõige\s+odavam|odavaim|cheapest)\b/i
.test(userMessage);
if (isCheapestOnlyQuery) {
excludeIds = []; // Show all results, even previously shown
}
Code Reference: Lines 442-466, 469-478
Configuration & Tuning
Environment Variables
# Enable parallel orchestrator
PARALLEL_EXECUTION_ENABLE=true
# Context extraction timeout (inner: 25s main + 4s classifier)
CONTEXT_EXTRACTION_TIMEOUT_MS=25000
# Enable Phase 5 context persistence
PHASE5_CONTEXT_ENABLE=true
# Enable debug logging
CHAT_DEBUG_LOGS=true
Performance Tuning
Context Extraction Timeout:
// Inner timeouts
CONTEXT_EXTRACTION_TIMEOUT_MS=25000 // Main extractor
FAST_CLASSIFIER_TIMEOUT_MS=4000 // Fast classifier
// Outer safety timeout
PARALLEL_ORCHESTRATOR_TIMEOUT=35000 // 25s + 4s + 6s buffer
Search Timeout:
// SearchOrchestrator has its own timeouts
SEARCH_TIMEOUT_MS=15000
Skeleton Delay:
// How long before sending skeleton (for signal detection)
// Currently: After context extraction and signal analysis (fast)
// Could optimize: Send skeleton earlier, adjust route later (risky)
🐛 Debugging & Observability
Debug Logging
Enable:
export CHAT_DEBUG_LOGS=true
Key Log Points:
- Validation (Lines 75-132):
PARALLEL: Validating query...
⚠️ PARALLEL: OBVIOUS NONSENSE DETECTED
PARALLEL: LLM VALIDATION RESULT
- Context (Lines 175-181):
CONTEXT COMPLETE (Parallel): { time: 900ms, intent: ... }
- Routing (Lines 269-332):
VAGUE GIFT QUERY DEBUG: { ... all signals ... }
ROUTING DECISION DEBUG: { canSearch, decision, ... }
ROUTING DECISION DEBUG: { computed signals, decision }
- Search (Lines 493-508):
SEARCH COMPLETE (Parallel): { time: 400ms, products: 5 }
PARALLEL: Served products from memory cache
- Products (Lines 536-540):
RAW PRODUCTS FROM SEARCH (before normalization)
- Metadata (Lines 654-664):
📤 PARALLEL ORCHESTRATOR: Sending metadata
🏷️ CONTEXT DATA BEING SENT
- Complete (Lines 714-722):
PARALLEL FLOW COMPLETE: { totalTime, improvement }
Performance Metrics
Logged Automatically:
const totalTime = Date.now() - startTime;
const improvement = Math.round((5850 - totalTime) / 5850 * 100);
console.log(' PARALLEL FLOW COMPLETE:', {
totalTime: totalTime + 'ms',
contextTime: contextTime + 'ms',
searchTime: (Date.now() - searchStart) + 'ms',
improvement: `${improvement}% faster than baseline`
});
Typical Values:
- contextTime: 900ms (LLaMA 8B context extraction)
- searchTime: 400ms (multi-query search + rerank)
- totalTime: 1300-1500ms
- TTFC: <100ms (skeleton)
- Improvement: 97-98% faster perceived TTFC
Intent Metadata Event
Purpose: Debug intent detection and routing
Sent: After signal detection, before search/conversational
StreamingUtils.safeEnqueue(
controller,
StreamingUtils.createIntentMetadataEvent({
intent: giftContext.intent,
confidence: giftContext.confidence,
signals: {
hasProductType,
hasCategory,
hasRecipient,
hasOccasion
},
decision: 'search' | 'conversational' | 'clarifying_questions'
})
);
Use Cases:
- Testing intent detection accuracy
- Monitoring routing decisions
- Debugging signal detection issues
Code Reference: Lines 335-356
Best Practices
1. Always Send Skeleton for Product Search
// CORRECT: Send skeleton after confirming product search
if (canSearch) {
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
// ... then execute search
}
// WRONG: Send skeleton before knowing if search is needed
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
if (!canSearch) {
// Now user sees loading but gets conversational response - bad UX!
}
2. Override Vague Detection for Explicit Requests
// User explicitly said "show me gifts"
const hasExplicitProductType = giftContext.productType?.trim().length > 0;
// Even if productType is "Kingitused" (fallback), user requested it explicitly
if (isVagueGiftQuery && hasExplicitProductType) {
isVagueGiftQuery = false; // Override, execute search
}
3. Memory Resolution Before Search
// CORRECT: Try memory first, then search
let searchResult = resolveFromMemory({ /* ... */ });
if (!searchResult) {
searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });
}
// WRONG: Always search (wastes time and resources)
const searchResult = await SearchOrchestrator.orchestrate({ /* ... */ });
4. Graceful Error Handling
try {
contextResult = await Promise.race([
contextPromise,
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), 35000)
)
]);
} catch (error) {
// CORRECT: Graceful error message
const errorText = process.env.NODE_ENV === 'production'
? 'Vabandust, tekkis tehniline viga.'
: `Vabandust: ${error.message}`;
StreamingUtils.safeEnqueue(controller,
StreamingUtils.createTextDeltaEvent(errorText));
controller.close();
return;
}
Integration Points
Upstream Dependencies
-
ContextOrchestrator (
context-orchestrator/)orchestrate()- Extract GiftContextpersistContext()- Save conversation state
-
SearchOrchestrator (
search-orchestrator.ts)orchestrate()- Execute product search
-
ResponseOrchestrator (
response-orchestrator.ts)generateWithDelayedCards()- Stream response with products
-
HandlerRouter (
handlers/handler-router.ts)route()- Route to appropriate handler (clarifying, conversational, etc.)
Downstream Consumers
-
route.ts - HTTP entry point
- Calls
ParallelOrchestrator.execute()
- Calls
-
Frontend - Receives streaming events
- Skeleton event → Show loading UI
- Product metadata → Render product cards
- Text deltas → Stream AI response
- Intent metadata → Debug display
Performance Comparison
Sequential vs Parallel
| Metric | Sequential | Parallel | Improvement |
|---|---|---|---|
| TTFC | slow | <100ms | 58x faster |
| Context Extraction | 1.9s | 900ms | 2.1x faster |
| Search | 1.2s (after context) | 400ms (parallel) | 3x faster |
| User Sees Something | slow | 0.05s | 117x faster |
| Products Appear | slow | 1.4s | 4.2x faster |
| Total Time | slow | 1.5s | 3.9x faster |
Key Insight: Parallel mode doesn't just speed up individual steps—it changes the perception of speed by providing immediate feedback.
Skeleton Response Impact
Without Skeleton (Sequential):
[5.8s delay with no feedback] → Products appear
User experience: "Is this working? 😕"
With Skeleton (Parallel):
[0.05s] → Skeleton appears
[0.9s] → AI starts streaming
[1.4s] → Products replace skeleton
User experience: "Fast and responsive! 😊"
Perceived Improvement: 98% faster to first visual feedback
🔧 Troubleshooting
Issue 1: Skeleton Sent for Conversational Intent
Symptom: User sees loading skeleton but gets conversational response
Cause: Skeleton sent before signal detection
Fix:
// WRONG: Send skeleton immediately
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
const canSearch = detectIfCanSearch(giftContext);
// CORRECT: Send skeleton after confirming product search
const canSearch = detectIfCanSearch(giftContext);
if (canSearch) {
StreamingUtils.safeEnqueue(controller, StreamingUtils.createSkeletonEvent(1));
}
Issue 2: Vague Queries Triggering Search
Symptom: "I need help" triggers product search instead of conversational
Cause: Explicit override not working
Debug:
console.log(' VAGUE GIFT QUERY DEBUG:', {
hasMeaningfulSignals,
giftContextMissing,
hasExplicitProductType,
isVagueGiftQuery
});
Fix: Check signal detection logic in context-signals.ts
Issue 3: Context Extraction Timeout
Symptom: Requests failing with "Context orchestration timeout after 35s"
Cause: LLM taking too long or hanging
Debug:
export CHAT_DEBUG_LOGS=true
# Check context extraction logs
Fix:
- Increase timeout:
CONTEXT_EXTRACTION_TIMEOUT_MS=30000 - Check LLM provider status (Groq)
- Verify network connectivity
Issue 4: Products Not Injecting
Symptom: Skeleton shows but products never appear
Cause: Search returning 0 results or metadata event not sent
Debug:
if (searchResult.products.length === 0) {
console.log(' NO PRODUCTS FOUND');
}
console.log('📤 PARALLEL ORCHESTRATOR: Sending metadata:', metadata);
Fix: Check search logs, verify product pool
Related Documentation
Core Systems
- High-Level Architecture - System overview
- Context Understanding - Context extraction
- Context Signals - Signal detection
Handlers
- Handler Router - Handler selection
- Clarifying Question Handler - Vague query handling
- Query Refinement Handler - No results handling
Search & Response
- Search Orchestrator - Product search
- Response Orchestrator - Response streaming
Future Optimizations
Potential Improvements
-
Even Faster TTFC
- Send skeleton before context extraction (risky)
- Adjust route dynamically if intent changes
- Target: <50ms TTFC
-
Smarter Memory Resolution
- Cache more query patterns
- Fuzzy matching for similar queries
- Target: 50% of queries served from memory
-
Progressive Product Loading
- Inject products one-by-one as they're found
- Don't wait for full search completion
- Target: First product at <1s
-
Predictive Search
- Start search speculatively during context extraction
- Cancel if routing goes conversational
- Target: Products ready by context completion
Last Updated: 2025-01-17
Version: 2.0
Status: Production Ready