Phase 3: Semantic Rerank
Model Selection
Model: llama-4-scout-17b-16e-instruct
Provider: Groq
Temperature: 0.3
Input: 10-20 finalist products
Output: Relevance scores (0-1)
Purpose
Score product candidates for semantic relevance to the user's gift intent:
- Gift-fit scoring: How well does product match the occasion/recipient?
- Semantic relevance: Beyond keyword matching
- Context awareness: Consider full conversation context
- Quality filtering: Remove low-scoring results
Why LLaMA 4 Scout 17B?
Fast Reasoning
- Scoring time: ~150ms for 10-20 products
- Batch processing: Can score multiple items simultaneously
- Quick response: Doesn't bottleneck the pipeline
Cost-Efficient
- Rate: ~$0.30 per 1000 requests
- Comparison: 70% cheaper than Cohere Rerank v3.5
- Volume: Sustainable at high traffic
Context-Aware
- Full gift context: Uses recipient, occasion, age in scoring
- Conversation history: Considers previous interactions
- Nuanced: Understands Estonian cultural context
Streaming Support 📡
- Incremental results: Can process as they arrive
- Early exit: Stop when top 3 found
- Non-blocking: Doesn't hold other operations
Alternative: Cohere Rerank v3.5
Considered but not selected:
- Better quality (marginally)
- Higher cost (~$1.00 per 1000)
- Higher latency (~300ms)
- No conversation context support
Decision: LLaMA provides 90% of quality at 30% of cost
Implementation
Location: services/rerank.ts:27-214, 219-266
async function rerankProducts(
products: Product[],
giftContext: GiftContext,
userQuery: string
): Promise<ScoredProduct[]> {
const prompt = buildRerankPrompt(products, giftContext, userQuery);
const response = await groq.chat.completions.create({
model: 'llama-4-scout-17b-16e-instruct',
messages: [
{ role: 'system', content: RERANK_SYSTEM_PROMPT },
{ role: 'user', content: prompt }
],
temperature: 0.3,
response_format: { type: 'json_object' }
});
const scores = parseScores(response.choices[0].message.content);
return products.map((product, i) => ({
...product,
relevanceScore: scores[i],
rerankSource: 'llama-4-scout-17b'
}));
}
Scoring Prompt
const RERANK_SYSTEM_PROMPT = `
You are a gift recommendation expert. Score each product (0-1) for
how well it fits the user's gift intent.
Consider:
- Recipient (age, gender, relationship)
- Occasion (birthday, holiday, thank you)
- Budget constraints
- Product category appropriateness
- Estonian cultural context (if applicable)
Return JSON: { "scores": [0.85, 0.72, ...] }
`;
Scoring Example
Input:
{
"userQuery": "sünnipäevakingitus 10-aastasele tüdrukule",
"recipient": "tüdruk",
"ageGroup": "child",
"occasion": "sünnipäev",
"budget": { "min": null, "max": 30 },
"products": [
{ "name": "Harry Potter raamat", "category": "Ilukirjandus", "price": 25 },
{ "name": "Kokaraamat", "category": "Teaduskirjandus", "price": 18 },
{ "name": "Lego komplekt", "category": "Mänguasjad", "price": 28 }
]
}
Output:
{
"scores": [
0.92, // Harry Potter - perfect for 10yo girl
0.45, // Cookbook - not age appropriate
0.88 // Lego - age appropriate, good gift
]
}
Quality Thresholds
const QUALITY_THRESHOLDS = {
PREFERRED: 0.5, // High quality results
MINIMUM: 0.3, // Acceptable fallback
REJECT: 0.2 // Too low, exclude
};
// Preferred approach
const highQuality = scored.filter(p => p.score >= 0.5);
// Fallback if <3 results
if (highQuality.length < 3) {
const mediumQuality = scored.filter(p => p.score >= 0.3);
return mediumQuality.slice(0, 3);
}
return highQuality.slice(0, 20); // Pass to Phase 4
Performance Metrics
Typical Execution:
Input: 20 finalist products
Timing:
├─ Build prompt: ~20ms
├─ Groq API call: ~120ms
├─ Parse scores: ~5ms
├─ Apply scores: ~5ms
└─ Filter by threshold: ~5ms
────────────────────────
Total: ~155ms ✓
Output: 12 products (score ≥ 0.5)
Optimization Strategies
1. Batch Scoring
// Score up to 30 products in one call
const BATCH_SIZE = 30;
if (products.length > BATCH_SIZE) {
// Split and score in parallel batches
const batches = chunk(products, BATCH_SIZE);
const results = await Promise.all(
batches.map(batch => rerankProducts(batch))
);
return results.flat();
}
2. Skip if Few Finalists
const MIN_FINALISTS_FOR_RERANK = 3;
if (finalists.length < MIN_FINALISTS_FOR_RERANK) {
// Don't waste time reranking 1-2 products
return finalists;
}
3. Caching
// Cache scores for identical context
const cacheKey = hash({ products, giftContext });
if (rerankCache.has(cacheKey)) {
return rerankCache.get(cacheKey)!;
}
Error Handling
Fallback Strategy
try {
const scored = await rerankProducts(products, context);
return scored;
} catch (error) {
console.error('Rerank failed:', error);
// Fallback: use search scores
return products.map(p => ({
...p,
relevanceScore: p.searchScore || 0.5, // Use original
rerankSource: 'fallback'
}));
}
Validation
function validateScores(scores: number[]): boolean {
// All scores between 0-1
if (scores.some(s => s < 0 || s > 1)) return false;
// Length matches products
if (scores.length !== products.length) return false;
return true;
}
Testing
describe('Semantic Rerank', () => {
it('scores products by relevance', async () => {
const products = [
{ name: 'Suitable Gift', category: 'Perfect' },
{ name: 'Poor Fit', category: 'Wrong' }
];
const scored = await rerankProducts(products, context);
expect(scored[0].relevanceScore).toBeGreaterThan(0.7);
expect(scored[1].relevanceScore).toBeLessThan(0.4);
});
it('handles API failures gracefully', async () => {
mockGroqError();
const scored = await rerankProducts(products, context);
expect(scored[0].rerankSource).toBe('fallback');
expect(scored.length).toBe(products.length);
});
});
Monitoring
{
rerankTimeMs: number, // Should be <200ms
candidatesIn: number, // Finalists scored
candidatesOut: number, // After threshold filter
averageScore: number, // Quality indicator
scoreDistribution: {
high: number, // score >= 0.7
medium: number, // 0.5 <= score < 0.7
low: number // score < 0.5
},
fallbackTriggered: boolean
}
Comparison: LLaMA vs Cohere
| Metric | LLaMA 4 Scout 17B | Cohere Rerank v3.5 |
|---|---|---|
| Latency | ~150ms | ~300ms |
| Cost (1k req) | $0.30 | $1.00 |
| Quality Score | 4.2/5 | 4.5/5 |
| Context Support | Full | Limited |
| Estonian Support | Native | ⚠️ Fair |
| Choice | ** Selected** | Too expensive |
Related Documentation
- Phase 1-2: Query & Search - Previous phase
- Phase 4: Diversity Selection - Next phase
- Pipeline Models - Complete overview