Skip to main content

Phase 5: Response Generation

Model Selection

Model: gpt-5.1-chat-latest
Provider: OpenAI
Temperature: 0.7
Max Tokens: 2500
Priority: High
Streaming: Enabled

Purpose

Generate natural, engaging responses that:

  • Narrate search results: Explain why products are good fits
  • Follow language rules: Native Estonian and English
  • Format recommendations: Numbered lists, delayed product cards
  • Maintain conversation tone: Friendly, helpful, professional
  • Handle edge cases: Pool exhaustion, clarifications, errors

Why GPT-5.1?

Quality

  • Language Generation: Superior fluency in both ET and EN
  • Complex Instructions: Follows detailed formatting rules
  • Reasoning: Understands gift-giving context nuances
  • Creativity: Natural, non-robotic responses

Multilingual

  • Native Estonian: Excellent grammar and idioms
  • Native English: Professional tone
  • Code-switching: Handles mixed language gracefully
  • Cultural Context: Understands Estonian customs

Reasoning Power

  • Gift Appropriateness: Explains why product fits
  • Comparative Analysis: Highlights differences between products
  • Tone Matching: Adapts to user's communication style
  • Edge Case Handling: Generates helpful clarifications

Priority Tier

  • High Priority: Minimizes queueing
  • Faster TTFB: First token arrives quickly
  • Low Latency: very fast to first chunk
  • Streaming: Immediate user feedback

Configuration

Location: app/api/chat/orchestrators/response-orchestrator.ts

export const GENERATION_CONFIG = {
model: 'gpt-5.1-chat-latest',
temperature: 0.7,
max_tokens: 2500,
stream: true,
priority: 'high',

// Formatting rules
delayedCardInjection: true,
cardInjectionThreshold: 180, // chars before injecting

// Safety
repetitionDetection: true,
tokenLimitWarning: 0.9 // Warn at 90% usage
};

Implementation

Location: app/api/chat/services/ai-response.ts

async function* generateResponse(
products: Product[],
giftContext: GiftContext,
conversationHistory: Message[]
): AsyncGenerator<string> {
const systemPrompt = buildSystemPrompt(giftContext);
const userPrompt = buildUserPrompt(products, giftContext);

const stream = await openai.chat.completions.create({
model: GENERATION_CONFIG.model,
messages: [
{ role: 'system', content: systemPrompt },
...conversationHistory,
{ role: 'user', content: userPrompt }
],
temperature: GENERATION_CONFIG.temperature,
max_tokens: GENERATION_CONFIG.max_tokens,
stream: true,
stream_options: {
priority: 'high'
}
});

let accumulated = '';
let productCardsInjected = false;

for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
accumulated += content;

// Delayed card injection
if (!productCardsInjected &&
accumulated.length >= GENERATION_CONFIG.cardInjectionThreshold) {
yield { type: 'inject-cards', products };
productCardsInjected = true;
}

yield { type: 'text', content };
}
}

Response Patterns

Product Recommendation

const PRODUCT_RECOMMENDATION_TEMPLATE = `
Based on your request for {occasion} gifts for {recipient},
I recommend these {count} products:

1. **{product1}** - {reason1}
2. **{product2}** - {reason2}
3. **{product3}** - {reason3}

{additional_context}
`;

Example Output (Estonian):

Sinu sünnipäevakingituse otsingu põhjal soovitan järgmisi 
raamatuid 10-aastasele tüdrukule:

1. **Harry Potter ja tarkade kivi** - Põnev fantaasiaraamat,
mis sobib suurepäraselt selle vanuse lastele.

2. **Matilda** - Roald Dahli klassika, mis õpetab lugema
armastama.

3. **Anne Shirley** - Seikluslik lugu tugevast tüdrukust,
inspireeriv lugemine.

Kõik need mahuvad sinu 30-eurosesse eelarvesse!

Clarifying Question

const CLARIFICATION_TEMPLATE = `
{polite_acknowledgment}

{clarification_question}

{helpful_suggestions}
`;

Example Output:

Tere! Meeleldi aitan kingituse leidmisel.

Kellele kingitust otsid?
- Perele
- Sõbrale
- Kolleegile
- Lapsele

Või kirjelda täpsemalt, mida otsid!

Pool Exhaustion Acknowledgment

const EXHAUSTION_TEMPLATE = `
{transparent_explanation}

{what_was_shown}

{alternative_suggestion}
`;

Example Output:

Oleme näidanud kõik eesti luule raamatud meie valikust!

Näitasin sulle {count} erinevat teost.

Kas soovid näha:
- Muus kategoorias raamatuid?
- Ingliskeelseid luulekogu?
- Teisi eesti kirjanduse žanreid?

Delayed Card Injection

Why 180 characters?

  • User reads intro first
  • Products appear as context loads
  • Smooth UX transition
  • Avoids flash of content

Quality Controls

1. Repetition Detection

function detectRepetition(text: string): boolean {
// Consecutive: 3+ same words in a row
const consecutivePattern = /(\b\w+\b)(\s+\1){2,}/g;
if (consecutivePattern.test(text)) return true;

// Frequent: 3+ occurrences in 20-word window
const words = text.split(/\s+/).slice(-20);
const wordCounts = new Map<string, number>();
for (const word of words) {
wordCounts.set(word, (wordCounts.get(word) || 0) + 1);
}

return Array.from(wordCounts.values()).some(count => count >= 3);
}

2. Token Limit Monitoring

const TOKEN_LIMIT = 2500;
const WARNING_THRESHOLD = 0.9;

if (tokenCount >= TOKEN_LIMIT * WARNING_THRESHOLD) {
console.warn('Approaching token limit:', {
used: tokenCount,
limit: TOKEN_LIMIT,
remaining: TOKEN_LIMIT - tokenCount
});
}

3. Product Mention Validation

function validateProductMentions(
response: string,
products: Product[]
): boolean {
// Check if response mentions the products
const mentioned = products.filter(p =>
response.toLowerCase().includes(p.name.toLowerCase())
);

if (mentioned.length === 0) {
console.warn('Response doesn\'t mention any products');
return false;
}

return true;
}

Performance Metrics

Typical Execution:

Timing:
├─ Build system prompt: ~5ms
├─ Build user prompt: ~3ms
├─ OpenAI API TTFB: ~80ms
├─ First chunk arrives: very fast ✓
├─ Streaming (200 tokens): ~1500ms
├─ Card injection: ~1ms
├─ Stream complete: ~2000ms
└─ Validation: ~5ms
───────────────────────────
Total: ~2100ms

Error Handling

Fallback Response

const FALLBACK_RESPONSES = {
et: "Vabandust, ei suutnud vastust genereerida. Palun proovi uuesti.",
en: "Sorry, I couldn't generate a response. Please try again."
};

try {
return await generateResponse(products, context);
} catch (error) {
const lang = context.language || 'et';
return FALLBACK_RESPONSES[lang];
}

Cut-off Handling

if (finishReason === 'length') {
// Token limit reached
return response + '...'; // Graceful ellipsis
}

Testing

describe('Response Generation', () => {
it('generates Estonian response', async () => {
const products = mockProducts(3);
const context = { language: 'et', recipient: 'ema' };

const response = await generateResponse(products, context);

expect(response).toContain('soovitan');
expect(response.length).toBeGreaterThan(100);
});

it('mentions all products', async () => {
const products = mockProducts(3);
const response = await generateResponse(products, context);

products.forEach(p => {
expect(response.toLowerCase()).toContain(
p.name.toLowerCase()
);
});
});

it('respects token limit', async () => {
const response = await generateResponse(products, context);
const tokens = estimateTokens(response);

expect(tokens).toBeLessThan(2500);
});
});

Monitoring

{
generationTimeMs: number, // Should be ~2000ms
tokenCount: number, // Should be &lt;2500
firstChunkMs: number, // Should be &lt;150ms
repetitionDetected: boolean, // Should be &lt;1%
cutoffOccurred: boolean, // Should be &lt;5%
productsMentioned: number, // Should equal products.length

// Quality
responseLength: number,
languageMatched: boolean, // Response in correct language
formatValid: boolean // Proper structure
}

Cost Optimization

Strategies:

  1. Smart Prompts: Concise system prompts save tokens
  2. History Pruning: Only include last 3 messages
  3. Early Stopping: Stop at 200 tokens if sufficient
  4. Caching: System prompt caching (when available)

Token Breakdown:

System Prompt: ~400 tokens
Conversation History: ~300 tokens
User Prompt: ~150 tokens
Product Context: ~200 tokens
─────────────────────────
Input: ~1050 tokens

Generated Response: ~200 tokens
─────────────────────────
Total: ~1250 tokens @ $0.002 = $0.0025 per request