Phase 0: Context Detection
Model Selection
Model: llama-4-scout-17b-16e-instruct
Provider: Groq
Temperature: 0.1
Format: JSON Object
Purpose
Extract structured context from natural language user queries including:
- Intent classification (product_search, show_more, author_search, etc.)
- Taxonomy parsing (product type, category, hints)
- Budget constraints (min, max, currency)
- Conversation context (recipient, occasion, age group)
Why LLaMA 4 Scout 17B?
Speed
- TTFT: <1s (Time to First Token)
- Total: fast for full context extraction
- Impact: Enables sub-second TTFC for entire pipeline
Cost
- Rate: ~$0.50 per 1000 requests
- Comparison: 80% cheaper than GPT-4 class models
- Volume: Handles high traffic at reasonable cost
Reliability
- JSON Output: Consistently produces valid structured data
- Schema Adherence: Follows extraction schema reliably
- Error Rate: <1% malformed responses
Testing Support
- Deterministic Mode: Seed parameter for reproducible tests
- Fast Iteration: Quick test execution
- Predictable: Same input → same output (when seeded)
Configuration
Location: context-understanding/config.ts
export const CONTEXT_MODEL_CONFIG = {
model: 'llama-4-scout-17b-16e-instruct',
temperature: 0.1,
response_format: { type: 'json_object' },
seed: process.env.NODE_ENV === 'test' ? 42 : undefined,
max_tokens: 1500
};
Implementation
Location: context-understanding/index.ts:255-330, 620-700
async function extractContext(
userMessage: string,
conversationHistory: Message[]
): Promise<GiftContext> {
const response = await groq.chat.completions.create({
model: CONTEXT_MODEL_CONFIG.model,
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
...conversationHistory,
{ role: 'user', content: userMessage }
],
temperature: CONTEXT_MODEL_CONFIG.temperature,
response_format: CONTEXT_MODEL_CONFIG.response_format,
seed: CONTEXT_MODEL_CONFIG.seed
});
return parseGiftContext(response.choices[0].message.content);
}
Extraction Schema
Output Structure:
interface GiftContext {
// Intent & Confidence
intent: string; // 'product_search' | 'show_more' | ...
confidence: number; // 0-1 confidence score
// Taxonomy
productType?: string; // 'Raamat' | 'Kinkekaart' | ...
category?: string; // 'Ilukirjandus' | 'Tehnika' | ...
categoryHints?: string[]; // Alternative categories
productTypeHints?: string[]; // Alternative types
// Recipient Context
recipient?: string; // 'ema' | 'sõber' | 'kolleeg'
recipientGender?: string; // 'male' | 'female' | 'unisex'
ageGroup?: string; // 'child' | 'adult' | 'elderly'
occasion?: string; // 'sünnipäev' | 'jõulud'
// Budget
budget?: {
min?: number;
max?: number;
hint?: string;
};
// Product-Specific
authorName?: string; // For book searches
bookLanguage?: string; // 'et' | 'en'
isPopularQuery?: boolean; // Popular products flag
constraints?: string[]; // Exclusion rules
// Metadata
language: 'et' | 'en';
timestamp: number;
}
Performance Metrics
Typical Execution:
Input: "Näita mulle raamatuid kuni 30 eurot"
Timing:
├─ Groq API Call: ~850ms
├─ JSON Parsing: ~10ms
├─ Validation: ~5ms
└─ Post-processing: ~35ms
──────────────────────
Total: fast ✓
Output:
{
"intent": "product_search",
"confidence": 0.95,
"productType": "Raamat",
"category": null,
"categoryHints": ["Ilukirjandus", "Teaduskirjandus"],
"budget": {
"min": null,
"max": 30,
"hint": "kuni 30 eurot"
},
"language": "et"
}
Error Handling
Fallback Strategy
try {
const context = await extractContext(message, history);
return context;
} catch (error) {
console.error('Context extraction failed:', error);
// Return safe default
return {
intent: 'product_search',
confidence: 0.3,
language: detectLanguage(message),
timestamp: Date.now()
};
}
Validation
function validateContext(context: GiftContext): boolean {
// Required fields
if (!context.intent || !context.language) return false;
// Budget sanity check
if (context.budget?.min > context.budget?.max) return false;
// Confidence threshold
if (context.confidence < 0.1) return false;
return true;
}
Optimization Strategies
1. Caching
const contextCache = new Map<string, GiftContext>();
const cacheKey = hashMessage(userMessage, conversationHistory);
if (contextCache.has(cacheKey)) {
return contextCache.get(cacheKey)!; // ~15% hit rate
}
2. Fast Classifier
Pre-filter for simple intents:
// Skip LLM for obvious cases
if (isShowMoreMessage(message)) {
return { intent: 'show_more_products', confidence: 1.0 };
}
if (isGreeting(message)) {
return { intent: 'greeting', confidence: 1.0 };
}
3. Batch Processing
// Process multiple extractions in parallel
const contexts = await Promise.all(
messages.map(msg => extractContext(msg, history))
);
Testing
Unit Tests
describe('Context Extraction', () => {
it('extracts product type', async () => {
const context = await extractContext('näita raamatuid');
expect(context.productType).toBe('Raamat');
expect(context.intent).toBe('product_search');
});
it('handles seeded determinism', async () => {
const ctx1 = await extractContext('kingitused', [], { seed: 42 });
const ctx2 = await extractContext('kingitused', [], { seed: 42 });
expect(ctx1).toEqual(ctx2);
});
});
Monitoring
Track these metrics in production:
{
extractionTimeMs: number, // Should be <1000ms
confidence: number, // Should be >0.7
fallbackTriggered: boolean, // Should be <5%
cacheHitRate: number, // Target: >15%
errorRate: number, // Should be <1%
// Intent distribution
intentCounts: {
product_search: number,
show_more: number,
author_search: number,
// ...
}
}
Related Documentation
- Phase 1-2: Query & Search - Next phase
- Phase 5: Response Generation - Final phase
- Pipeline Models - Complete model overview