Estonian Compound Words Problem
The compound word detection failure was the most critical Estonian-specific bug encountered, causing 80% failure rate in certain gift scenarios.
The Problem
Scenario
User Query:
"Otsin luuleraamatut sõbrapäevakingiks"
Translation: "I'm looking for a poetry book for Valentine's gift"
Expected:
- Detect "luuleraamatut" as explicit book request
- Show poetry books
- Type: Raamat
Actual (Before Fix):
- LLM correctly detected:
productType: "Raamat" - System added constraint:
"Ei raamatuid"(No books) - Showed: Scented candles instead of books
- Success rate: 20% (1/5 Valentine's scenarios)
Root Cause
The Paradox
The LLM correctly understood the Estonian compound word and set productType: "Raamat", but the system still excluded books.
Why?
Three-part failure:
- Keyword-based filtering used hardcoded list
- Gift occasion override ran AFTER LLM extraction
- Signal loss in pipeline
Original Keyword List
const BOOK_KEYWORDS = [
'raamat',
'raamatu',
'raamatut',
'raamatuid',
'raamatud',
'book',
'books',
'novel',
'novels'
];
// Only 9 keywords!
Problem: Missing ALL compound words!
Solution: Expanded Keywords
Location: extract-context.ts:30-73
const BOOK_KEYWORDS = [
// Original keywords
'raamat', 'raamatu', 'raamatut', 'raamatuid', 'raamatud',
'book', 'books', 'novel', 'novels', 'romaan', 'romaani',
// Poetry books (3 cases each)
'luuleraamat', 'luuleraamatu', 'luuleraamatut',
// Mystery books
'krimiraamat', 'krimiraamatu', 'krimiraamatut',
// Sci-fi books
'ulmeraamat', 'ulmeraamatu', 'ulmeraamatut',
// Fantasy books
'fantaasiaraamat', 'fantaasiaraamatu', 'fantaasiaraamatut',
// Cookbooks
'kokaraamat', 'kokaraamatu', 'kokaraamatut',
// Children's books
'lasteraamat', 'lasteraamatu', 'lasteraamatut',
// Travel books
'reisiraamat', 'reisiraamatu', 'reisiraamatut',
// Textbooks
'õpperaamat', 'õpperaamatu', 'õpperaamatut'
] as const;
Result:
- From 9 keywords to 39 keywords
- Covers most common compound book types
- Includes 3 main cases per compound
Enhanced Detection Function
Location: extract-context.ts:375-414
const wantsBooksFromContext = (
context: GiftContext,
userMessage?: string
): boolean => {
// Check 1: LLM signals
const productTypeIsBook = isBookSignal(context.productType);
const categoryIsBook = isBookSignal(context.category);
const productTypeHintsHaveBooks = context.productTypeHints?.some(isBookSignal);
const categoryHintsHaveBooks = context.categoryHints?.some(isBookSignal);
const wantsBooksFromSignals =
productTypeIsBook ||
categoryIsBook ||
productTypeHintsHaveBooks ||
categoryHintsHaveBooks;
// Check 2: CRITICAL - Also check user message
const normalizedMessage = normalizeMessage(userMessage);
const messageHasBookKeyword = messageContainsKeyword(
normalizedMessage,
BOOK_KEYWORDS
);
// Check 3: Negative phrase detection
const negativeBookPhrase = findNegativeBookPhrase(normalizedMessage);
// Final decision: Want books if no negative phrase AND (signals OR keywords)
const wantsBooks = negativeBookPhrase
? false
: (wantsBooksFromSignals || messageHasBookKeyword);
return wantsBooks;
};
Key Improvement: Now checks BOTH:
- LLM-extracted signals (productType, category, hints)
- User message keywords (catches compounds LLM might miss)
Safety Override
Location: extract-context.ts:492-497, 685-697
// Preserve initial book signals
const initialBookSignalsPresent =
isBookSignal(giftContext.productType) ||
isBookSignal(giftContext.category) ||
giftContext.productTypeHints?.some(isBookSignal) ||
giftContext.categoryHints?.some(isBookSignal);
// ... later in processing ...
// SAFETY OVERRIDE: Prevent false negatives
if (!wantsBooks && (initialBookSignalsPresent || messageHasBookKeyword)) {
console.warn('[SAFETY] Overriding wantsBooks=false', {
initialBookSignalsPresent,
messageHasBookKeyword
});
wantsBooks = true; // Force to true
}
Why: Even if intermediate steps corrupt context, safety override catches the explicit signal.
Removed Problematic Override
What was removed:
// REMOVED: This overrode LLM decisions
if (ROMANTIC_OCCASIONS.includes(occasion)) {
if (!explicitlyRequestedBooks) {
queryRewrite.excludeProductTypes = ['Raamat'];
queryRewrite.forceProductTypes = ['Kingitused'];
}
}
Why it was wrong:
- Assumed Valentine's = never books
- But users DO want poetry books for Valentine's!
- Overrode LLM's correct understanding
Test Results
Query: "Otsin luuleraamatut sõbrapäevakingiks"
Before Fix:
- Constraints: "Ei raamatuid"
- Showed: Scented candles
- Success: 20%
After Fix:
- Constraints: "affordable, romantic"
- Showed: Romantic poetry books
- Success: 80%
Improvement: 20% → 80% success rate
Remaining Limitations
Rare Compounds Not in List
// Not yet in list:
"biografiaraamat" (biography book)
"ajalooraamat" (history book)
"filosoofiaraamat" (philosophy book)
Mitigation: LLM signal checking still works as backup
Typos in Compounds
"luleraamatut" (missing 'u')
"luuleraamaut" (wrong vowel)
Mitigation: LLM often corrects these internally
Case Endings Beyond Main Three
// We only added nominative/genitive/partitive
// Missing rarer cases like:
"luuleraamatusse" (illative)
"luuleraamatus" (inessive)
Mitigation: These cases are rare in gift queries
Related Documentation
- Estonian Overview - Main overview
- Morphological Cases - Case system handling
- Best Practices - Implementation guidelines