Estonian Mixed Language Handling
Estonian users naturally mix English technical terms into Estonian queries, requiring sophisticated language detection and handling.
The Problem: Code-Switching
Estonian users mix languages:
"Otsin science fiction raamatut sünnipäevaks"
"Vajan fantasy seriaali oma pojale"
"Kas teil on Harry Potter eestikeelne versioon?"
Challenge: Language detection must handle:
- Primary language (Estonian)
- Embedded technical terms (English)
- Proper nouns (Harry Potter)
- Genre names without Estonian equivalents
Language Detection Strategy
Location: app/api/chat/services/language.ts:238-278
Implementation
static detectLanguage(message: string): 'et' | 'en' | 'mixed' {
// Enhanced Estonian word detection
const estonianWords = /\b(soovita|otsin|tahan|kingitus|raamat|hea|suurepärane|aitäh|tere|mulle|palun)\b/i;
// Enhanced English word detection
const englishWords = /\b(recommend|search|want|need|looking|gift|book|good|great|thanks|hello|please|would|could)\b/i;
const hasEstonian = estonianWords.test(message);
const hasEnglish = englishWords.test(message);
if (hasEstonian && hasEnglish) {
return 'mixed';
} else if (hasEstonian) {
return 'et';
} else if (hasEnglish) {
return 'en';
} else {
// Default to Estonian (primary market)
return 'et';
}
}
Response Language Rules
// Rule 1: Pure Estonian → Estonian response
Query: "Otsin kingitust emale"
→ Detected: 'et'
→ Response: "Siin on mõned head soovitused..."
// Rule 2: Pure English → English response
Query: "Looking for a gift for my mother"
→ Detected: 'en'
→ Response: "Here are some great recommendations..."
// Rule 3: Mixed → Estonian response (primary language)
Query: "Otsin science fiction raamatut"
→ Detected: 'mixed'
→ Treated as: 'et'
→ Response: "Siin on mõned head science fiction raamatud..."
Technical Terms Translation
Some terms lack Estonian equivalents:
| English | Estonian (literal) | Actually Used |
|---|---|---|
| science fiction | teadusulme | "sci-fi" or "science fiction" |
| fantasy | fantaasia | "fantasy" |
| thriller | põnevik | "thriller" |
| romance | romantika | both used |
Solution: Accept Both Forms
const genreAliases = {
'science fiction': 'Ulme',
'sci-fi': 'Ulme',
'teadusulme': 'Ulme',
'fantasy': 'Fantaasia',
'fantaasia': 'Fantaasia',
'thriller': 'Põnevus',
'põnevik': 'Põnevus'
};
LLM Prompt Instructions
// For mixed language queries
if (detectedLanguage === 'mixed' || detectedLanguage === 'et') {
systemPrompt += `
IMPORTANT: User query contains Estonian language.
- Respond in ESTONIAN, even if query includes English technical terms
- English words like "science fiction", "Harry Potter", "fantasy" are normal in Estonian text
- Maintain natural Estonian grammar and sentence structure
- Use English terms only when no Estonian equivalent exists
Example correct response:
"Siin on mõned head science fiction raamatud teie jaoks..."
`;
}
Test Results
Test 1: Mixed Query
Input:
"Otsin science fiction raamatut sünnipäevaks"
Before:
- Responded in English
- Showed non-book recommendations
After:
- Detected: 'mixed' → Treat as Estonian
- Responded in Estonian
- Showed sci-fi books
Test 2: Proper Noun
Input:
"Kas teil on Harry Potter eestikeelne versioon?"
Before:
- ⚠️ Sometimes responded in English
After:
- Detected: 'mixed' → Treat as Estonian
- Responded in Estonian
- Correct handling of proper nouns
Related Documentation
- Estonian Overview - Main challenges
- Best Practices - Implementation guidelines