Skip to main content

Phase 0: Context Detection

Model Selection

Model: llama-4-scout-17b-16e-instruct
Provider: Groq
Temperature: 0.1
Format: JSON Object

Purpose

Extract structured context from natural language user queries including:

  • Intent classification (product_search, show_more, author_search, etc.)
  • Taxonomy parsing (product type, category, hints)
  • Budget constraints (min, max, currency)
  • Conversation context (recipient, occasion, age group)

Why LLaMA 4 Scout 17B?

Speed

  • TTFT: <1s (Time to First Token)
  • Total: fast for full context extraction
  • Impact: Enables sub-second TTFC for entire pipeline

Cost

  • Rate: ~$0.50 per 1000 requests
  • Comparison: 80% cheaper than GPT-4 class models
  • Volume: Handles high traffic at reasonable cost

Reliability

  • JSON Output: Consistently produces valid structured data
  • Schema Adherence: Follows extraction schema reliably
  • Error Rate: <1% malformed responses

Testing Support

  • Deterministic Mode: Seed parameter for reproducible tests
  • Fast Iteration: Quick test execution
  • Predictable: Same input → same output (when seeded)

Configuration

Location: context-understanding/config.ts

export const CONTEXT_MODEL_CONFIG = {
model: 'llama-4-scout-17b-16e-instruct',
temperature: 0.1,
response_format: { type: 'json_object' },
seed: process.env.NODE_ENV === 'test' ? 42 : undefined,
max_tokens: 1500
};

Implementation

Location: context-understanding/index.ts:255-330, 620-700

async function extractContext(
userMessage: string,
conversationHistory: Message[]
): Promise<GiftContext> {
const response = await groq.chat.completions.create({
model: CONTEXT_MODEL_CONFIG.model,
messages: [
{ role: 'system', content: SYSTEM_PROMPT },
...conversationHistory,
{ role: 'user', content: userMessage }
],
temperature: CONTEXT_MODEL_CONFIG.temperature,
response_format: CONTEXT_MODEL_CONFIG.response_format,
seed: CONTEXT_MODEL_CONFIG.seed
});

return parseGiftContext(response.choices[0].message.content);
}

Extraction Schema

Output Structure:

interface GiftContext {
// Intent & Confidence
intent: string; // 'product_search' | 'show_more' | ...
confidence: number; // 0-1 confidence score

// Taxonomy
productType?: string; // 'Raamat' | 'Kinkekaart' | ...
category?: string; // 'Ilukirjandus' | 'Tehnika' | ...
categoryHints?: string[]; // Alternative categories
productTypeHints?: string[]; // Alternative types

// Recipient Context
recipient?: string; // 'ema' | 'sõber' | 'kolleeg'
recipientGender?: string; // 'male' | 'female' | 'unisex'
ageGroup?: string; // 'child' | 'adult' | 'elderly'
occasion?: string; // 'sünnipäev' | 'jõulud'

// Budget
budget?: {
min?: number;
max?: number;
hint?: string;
};

// Product-Specific
authorName?: string; // For book searches
bookLanguage?: string; // 'et' | 'en'
isPopularQuery?: boolean; // Popular products flag
constraints?: string[]; // Exclusion rules

// Metadata
language: 'et' | 'en';
timestamp: number;
}

Performance Metrics

Typical Execution:

Input: "Näita mulle raamatuid kuni 30 eurot"

Timing:
├─ Groq API Call: ~850ms
├─ JSON Parsing: ~10ms
├─ Validation: ~5ms
└─ Post-processing: ~35ms
──────────────────────
Total: fast ✓

Output:

{
"intent": "product_search",
"confidence": 0.95,
"productType": "Raamat",
"category": null,
"categoryHints": ["Ilukirjandus", "Teaduskirjandus"],
"budget": {
"min": null,
"max": 30,
"hint": "kuni 30 eurot"
},
"language": "et"
}

Error Handling

Fallback Strategy

try {
const context = await extractContext(message, history);
return context;
} catch (error) {
console.error('Context extraction failed:', error);

// Return safe default
return {
intent: 'product_search',
confidence: 0.3,
language: detectLanguage(message),
timestamp: Date.now()
};
}

Validation

function validateContext(context: GiftContext): boolean {
// Required fields
if (!context.intent || !context.language) return false;

// Budget sanity check
if (context.budget?.min > context.budget?.max) return false;

// Confidence threshold
if (context.confidence < 0.1) return false;

return true;
}

Optimization Strategies

1. Caching

const contextCache = new Map<string, GiftContext>();

const cacheKey = hashMessage(userMessage, conversationHistory);

if (contextCache.has(cacheKey)) {
return contextCache.get(cacheKey)!; // ~15% hit rate
}

2. Fast Classifier

Pre-filter for simple intents:

// Skip LLM for obvious cases
if (isShowMoreMessage(message)) {
return { intent: 'show_more_products', confidence: 1.0 };
}

if (isGreeting(message)) {
return { intent: 'greeting', confidence: 1.0 };
}

3. Batch Processing

// Process multiple extractions in parallel
const contexts = await Promise.all(
messages.map(msg => extractContext(msg, history))
);

Testing

Unit Tests

describe('Context Extraction', () => {
it('extracts product type', async () => {
const context = await extractContext('näita raamatuid');

expect(context.productType).toBe('Raamat');
expect(context.intent).toBe('product_search');
});

it('handles seeded determinism', async () => {
const ctx1 = await extractContext('kingitused', [], { seed: 42 });
const ctx2 = await extractContext('kingitused', [], { seed: 42 });

expect(ctx1).toEqual(ctx2);
});
});

Monitoring

Track these metrics in production:

{
extractionTimeMs: number, // Should be &lt;1000ms
confidence: number, // Should be >0.7
fallbackTriggered: boolean, // Should be &lt;5%
cacheHitRate: number, // Target: >15%
errorRate: number, // Should be &lt;1%

// Intent distribution
intentCounts: {
product_search: number,
show_more: number,
author_search: number,
// ...
}
}