Phase 0: Context Detection

Model Selection

Model: llama-4-scout-17b-16e-instruct
Provider: Groq
Temperature: 0.1
Format: JSON Object

Purpose

Extract structured context from natural language user queries including:

Intent classification (product_search, show_more, author_search, etc.)
Taxonomy parsing (product type, category, hints)
Budget constraints (min, max, currency)
Conversation context (recipient, occasion, age group)

Why LLaMA 4 Scout 17B?

Speed

TTFT: <1s (Time to First Token)
Total: fast for full context extraction
Impact: Enables sub-second TTFC for entire pipeline

Cost

Rate: ~$0.50 per 1000 requests
Comparison: 80% cheaper than GPT-4 class models
Volume: Handles high traffic at reasonable cost

Reliability

JSON Output: Consistently produces valid structured data
Schema Adherence: Follows extraction schema reliably
Error Rate: <1% malformed responses

Testing Support

Deterministic Mode: Seed parameter for reproducible tests
Fast Iteration: Quick test execution
Predictable: Same input → same output (when seeded)

Configuration

Location: context-understanding/config.ts

export const CONTEXT_MODEL_CONFIG = {
  model: 'llama-4-scout-17b-16e-instruct',
  temperature: 0.1,
  response_format: { type: 'json_object' },
  seed: process.env.NODE_ENV === 'test' ? 42 : undefined,
  max_tokens: 1500
};

Implementation

Location: context-understanding/index.ts:255-330, 620-700

async function extractContext(
  userMessage: string,
  conversationHistory: Message[]
): Promise<GiftContext> {
  const response = await groq.chat.completions.create({
    model: CONTEXT_MODEL_CONFIG.model,
    messages: [
      { role: 'system', content: SYSTEM_PROMPT },
      ...conversationHistory,
      { role: 'user', content: userMessage }
    ],
    temperature: CONTEXT_MODEL_CONFIG.temperature,
    response_format: CONTEXT_MODEL_CONFIG.response_format,
    seed: CONTEXT_MODEL_CONFIG.seed
  });
  
  return parseGiftContext(response.choices[0].message.content);
}

Extraction Schema

Output Structure:

interface GiftContext {
  // Intent & Confidence
  intent: string;              // 'product_search' | 'show_more' | ...
  confidence: number;          // 0-1 confidence score
  
  // Taxonomy
  productType?: string;        // 'Raamat' | 'Kinkekaart' | ...
  category?: string;           // 'Ilukirjandus' | 'Tehnika' | ...
  categoryHints?: string[];    // Alternative categories
  productTypeHints?: string[]; // Alternative types
  
  // Recipient Context
  recipient?: string;          // 'ema' | 'sõber' | 'kolleeg'
  recipientGender?: string;    // 'male' | 'female' | 'unisex'
  ageGroup?: string;          // 'child' | 'adult' | 'elderly'
  occasion?: string;          // 'sünnipäev' | 'jõulud'
  
  // Budget
  budget?: {
    min?: number;
    max?: number;
    hint?: string;
  };
  
  // Product-Specific
  authorName?: string;         // For book searches
  bookLanguage?: string;       // 'et' | 'en'
  isPopularQuery?: boolean;    // Popular products flag
  constraints?: string[];      // Exclusion rules
  
  // Metadata
  language: 'et' | 'en';
  timestamp: number;
}

Performance Metrics

Typical Execution:

Input: "Näita mulle raamatuid kuni 30 eurot"

Timing:
├─ Groq API Call: ~850ms
├─ JSON Parsing: ~10ms
├─ Validation: ~5ms
└─ Post-processing: ~35ms
──────────────────────
Total: fast ✓

Output:

{
  "intent": "product_search",
  "confidence": 0.95,
  "productType": "Raamat",
  "category": null,
  "categoryHints": ["Ilukirjandus", "Teaduskirjandus"],
  "budget": {
    "min": null,
    "max": 30,
    "hint": "kuni 30 eurot"
  },
  "language": "et"
}

Error Handling

Fallback Strategy

try {
  const context = await extractContext(message, history);
  return context;
} catch (error) {
  console.error('Context extraction failed:', error);
  
  // Return safe default
  return {
    intent: 'product_search',
    confidence: 0.3,
    language: detectLanguage(message),
    timestamp: Date.now()
  };
}

Validation

function validateContext(context: GiftContext): boolean {
  // Required fields
  if (!context.intent || !context.language) return false;
  
  // Budget sanity check
  if (context.budget?.min > context.budget?.max) return false;
  
  // Confidence threshold
  if (context.confidence < 0.1) return false;
  
  return true;
}

Optimization Strategies

1. Caching

const contextCache = new Map<string, GiftContext>();

const cacheKey = hashMessage(userMessage, conversationHistory);

if (contextCache.has(cacheKey)) {
  return contextCache.get(cacheKey)!; // ~15% hit rate
}

2. Fast Classifier

Pre-filter for simple intents:

// Skip LLM for obvious cases
if (isShowMoreMessage(message)) {
  return { intent: 'show_more_products', confidence: 1.0 };
}

if (isGreeting(message)) {
  return { intent: 'greeting', confidence: 1.0 };
}

3. Batch Processing

// Process multiple extractions in parallel
const contexts = await Promise.all(
  messages.map(msg => extractContext(msg, history))
);

Testing

Unit Tests

describe('Context Extraction', () => {
  it('extracts product type', async () => {
    const context = await extractContext('näita raamatuid');
    
    expect(context.productType).toBe('Raamat');
    expect(context.intent).toBe('product_search');
  });
  
  it('handles seeded determinism', async () => {
    const ctx1 = await extractContext('kingitused', [], { seed: 42 });
    const ctx2 = await extractContext('kingitused', [], { seed: 42 });
    
    expect(ctx1).toEqual(ctx2);
  });
});

Monitoring

Track these metrics in production:

{
  extractionTimeMs: number,      // Should be &lt;1000ms
  confidence: number,            // Should be >0.7
  fallbackTriggered: boolean,    // Should be &lt;5%
  cacheHitRate: number,         // Target: >15%
  errorRate: number,            // Should be &lt;1%
  
  // Intent distribution
  intentCounts: {
    product_search: number,
    show_more: number,
    author_search: number,
    // ...
  }
}

Phase 1-2: Query & Search - Next phase
Phase 5: Response Generation - Final phase
Pipeline Models - Complete model overview

Model Selection​

Purpose​

Why LLaMA 4 Scout 17B?​

Speed​

Cost​

Reliability​

Testing Support​

Configuration​

Implementation​

Extraction Schema​

Performance Metrics​

Error Handling​

Fallback Strategy​

Validation​

Optimization Strategies​

1. Caching​

2. Fast Classifier​

3. Batch Processing​

Testing​

Unit Tests​

Monitoring​

Related Documentation​