Skip to main content

Estonian Mixed Language Handling

Estonian users naturally mix English technical terms into Estonian queries, requiring sophisticated language detection and handling.

The Problem: Code-Switching

Estonian users mix languages:

"Otsin science fiction raamatut sünnipäevaks"
"Vajan fantasy seriaali oma pojale"
"Kas teil on Harry Potter eestikeelne versioon?"

Challenge: Language detection must handle:

  1. Primary language (Estonian)
  2. Embedded technical terms (English)
  3. Proper nouns (Harry Potter)
  4. Genre names without Estonian equivalents

Language Detection Strategy

Location: app/api/chat/services/language.ts:238-278

Implementation

static detectLanguage(message: string): 'et' | 'en' | 'mixed' {
// Enhanced Estonian word detection
const estonianWords = /\b(soovita|otsin|tahan|kingitus|raamat|hea|suurepärane|aitäh|tere|mulle|palun)\b/i;

// Enhanced English word detection
const englishWords = /\b(recommend|search|want|need|looking|gift|book|good|great|thanks|hello|please|would|could)\b/i;

const hasEstonian = estonianWords.test(message);
const hasEnglish = englishWords.test(message);

if (hasEstonian && hasEnglish) {
return 'mixed';
} else if (hasEstonian) {
return 'et';
} else if (hasEnglish) {
return 'en';
} else {
// Default to Estonian (primary market)
return 'et';
}
}

Response Language Rules

// Rule 1: Pure Estonian → Estonian response
Query: "Otsin kingitust emale"
→ Detected: 'et'
→ Response: "Siin on mõned head soovitused..."

// Rule 2: Pure English → English response
Query: "Looking for a gift for my mother"
→ Detected: 'en'
→ Response: "Here are some great recommendations..."

// Rule 3: Mixed → Estonian response (primary language)
Query: "Otsin science fiction raamatut"
→ Detected: 'mixed'
→ Treated as: 'et'
→ Response: "Siin on mõned head science fiction raamatud..."

Technical Terms Translation

Some terms lack Estonian equivalents:

EnglishEstonian (literal)Actually Used
science fictionteadusulme"sci-fi" or "science fiction"
fantasyfantaasia"fantasy"
thrillerpõnevik"thriller"
romanceromantikaboth used

Solution: Accept Both Forms

const genreAliases = {
'science fiction': 'Ulme',
'sci-fi': 'Ulme',
'teadusulme': 'Ulme',
'fantasy': 'Fantaasia',
'fantaasia': 'Fantaasia',
'thriller': 'Põnevus',
'põnevik': 'Põnevus'
};

LLM Prompt Instructions

// For mixed language queries
if (detectedLanguage === 'mixed' || detectedLanguage === 'et') {
systemPrompt += `
IMPORTANT: User query contains Estonian language.

- Respond in ESTONIAN, even if query includes English technical terms
- English words like "science fiction", "Harry Potter", "fantasy" are normal in Estonian text
- Maintain natural Estonian grammar and sentence structure
- Use English terms only when no Estonian equivalent exists

Example correct response:
"Siin on mõned head science fiction raamatud teie jaoks..."
`;
}

Test Results

Test 1: Mixed Query

Input:

"Otsin science fiction raamatut sünnipäevaks"

Before:

  • Responded in English
  • Showed non-book recommendations

After:

  • Detected: 'mixed' → Treat as Estonian
  • Responded in Estonian
  • Showed sci-fi books

Test 2: Proper Noun

Input:

"Kas teil on Harry Potter eestikeelne versioon?"

Before:

  • ⚠️ Sometimes responded in English

After:

  • Detected: 'mixed' → Treat as Estonian
  • Responded in Estonian
  • Correct handling of proper nouns