Phase 5: Response Generation
Model Selection
Model: gpt-5.1-chat-latest
Provider: OpenAI
Temperature: 0.7
Max Tokens: 2500
Priority: High
Streaming: Enabled
Purpose
Generate natural, engaging responses that:
- Narrate search results: Explain why products are good fits
- Follow language rules: Native Estonian and English
- Format recommendations: Numbered lists, delayed product cards
- Maintain conversation tone: Friendly, helpful, professional
- Handle edge cases: Pool exhaustion, clarifications, errors
Why GPT-5.1?
Quality
- Language Generation: Superior fluency in both ET and EN
- Complex Instructions: Follows detailed formatting rules
- Reasoning: Understands gift-giving context nuances
- Creativity: Natural, non-robotic responses
Multilingual
- Native Estonian: Excellent grammar and idioms
- Native English: Professional tone
- Code-switching: Handles mixed language gracefully
- Cultural Context: Understands Estonian customs
Reasoning Power
- Gift Appropriateness: Explains why product fits
- Comparative Analysis: Highlights differences between products
- Tone Matching: Adapts to user's communication style
- Edge Case Handling: Generates helpful clarifications
Priority Tier
- High Priority: Minimizes queueing
- Faster TTFB: First token arrives quickly
- Low Latency: very fast to first chunk
- Streaming: Immediate user feedback
Configuration
Location: app/api/chat/orchestrators/response-orchestrator.ts
export const GENERATION_CONFIG = {
model: 'gpt-5.1-chat-latest',
temperature: 0.7,
max_tokens: 2500,
stream: true,
priority: 'high',
// Formatting rules
delayedCardInjection: true,
cardInjectionThreshold: 180, // chars before injecting
// Safety
repetitionDetection: true,
tokenLimitWarning: 0.9 // Warn at 90% usage
};
Implementation
Location: app/api/chat/services/ai-response.ts
async function* generateResponse(
products: Product[],
giftContext: GiftContext,
conversationHistory: Message[]
): AsyncGenerator<string> {
const systemPrompt = buildSystemPrompt(giftContext);
const userPrompt = buildUserPrompt(products, giftContext);
const stream = await openai.chat.completions.create({
model: GENERATION_CONFIG.model,
messages: [
{ role: 'system', content: systemPrompt },
...conversationHistory,
{ role: 'user', content: userPrompt }
],
temperature: GENERATION_CONFIG.temperature,
max_tokens: GENERATION_CONFIG.max_tokens,
stream: true,
stream_options: {
priority: 'high'
}
});
let accumulated = '';
let productCardsInjected = false;
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
accumulated += content;
// Delayed card injection
if (!productCardsInjected &&
accumulated.length >= GENERATION_CONFIG.cardInjectionThreshold) {
yield { type: 'inject-cards', products };
productCardsInjected = true;
}
yield { type: 'text', content };
}
}
Response Patterns
Product Recommendation
const PRODUCT_RECOMMENDATION_TEMPLATE = `
Based on your request for {occasion} gifts for {recipient},
I recommend these {count} products:
1. **{product1}** - {reason1}
2. **{product2}** - {reason2}
3. **{product3}** - {reason3}
{additional_context}
`;
Example Output (Estonian):
Sinu sünnipäevakingituse otsingu põhjal soovitan järgmisi
raamatuid 10-aastasele tüdrukule:
1. **Harry Potter ja tarkade kivi** - Põnev fantaasiaraamat,
mis sobib suurepäraselt selle vanuse lastele.
2. **Matilda** - Roald Dahli klassika, mis õpetab lugema
armastama.
3. **Anne Shirley** - Seikluslik lugu tugevast tüdrukust,
inspireeriv lugemine.
Kõik need mahuvad sinu 30-eurosesse eelarvesse!
Clarifying Question
const CLARIFICATION_TEMPLATE = `
{polite_acknowledgment}
{clarification_question}
{helpful_suggestions}
`;
Example Output:
Tere! Meeleldi aitan kingituse leidmisel.
Kellele kingitust otsid?
- Perele
- Sõbrale
- Kolleegile
- Lapsele
Või kirjelda täpsemalt, mida otsid!
Pool Exhaustion Acknowledgment
const EXHAUSTION_TEMPLATE = `
{transparent_explanation}
{what_was_shown}
{alternative_suggestion}
`;
Example Output:
Oleme näidanud kõik eesti luule raamatud meie valikust!
Näitasin sulle {count} erinevat teost.
Kas soovid näha:
- Muus kategoorias raamatuid?
- Ingliskeelseid luulekogu?
- Teisi eesti kirjanduse žanreid?
Delayed Card Injection
Why 180 characters?
- User reads intro first
- Products appear as context loads
- Smooth UX transition
- Avoids flash of content
Quality Controls
1. Repetition Detection
function detectRepetition(text: string): boolean {
// Consecutive: 3+ same words in a row
const consecutivePattern = /(\b\w+\b)(\s+\1){2,}/g;
if (consecutivePattern.test(text)) return true;
// Frequent: 3+ occurrences in 20-word window
const words = text.split(/\s+/).slice(-20);
const wordCounts = new Map<string, number>();
for (const word of words) {
wordCounts.set(word, (wordCounts.get(word) || 0) + 1);
}
return Array.from(wordCounts.values()).some(count => count >= 3);
}
2. Token Limit Monitoring
const TOKEN_LIMIT = 2500;
const WARNING_THRESHOLD = 0.9;
if (tokenCount >= TOKEN_LIMIT * WARNING_THRESHOLD) {
console.warn('Approaching token limit:', {
used: tokenCount,
limit: TOKEN_LIMIT,
remaining: TOKEN_LIMIT - tokenCount
});
}
3. Product Mention Validation
function validateProductMentions(
response: string,
products: Product[]
): boolean {
// Check if response mentions the products
const mentioned = products.filter(p =>
response.toLowerCase().includes(p.name.toLowerCase())
);
if (mentioned.length === 0) {
console.warn('Response doesn\'t mention any products');
return false;
}
return true;
}
Performance Metrics
Typical Execution:
Timing:
├─ Build system prompt: ~5ms
├─ Build user prompt: ~3ms
├─ OpenAI API TTFB: ~80ms
├─ First chunk arrives: very fast ✓
├─ Streaming (200 tokens): ~1500ms
├─ Card injection: ~1ms
├─ Stream complete: ~2000ms
└─ Validation: ~5ms
───────────────────────────
Total: ~2100ms
Error Handling
Fallback Response
const FALLBACK_RESPONSES = {
et: "Vabandust, ei suutnud vastust genereerida. Palun proovi uuesti.",
en: "Sorry, I couldn't generate a response. Please try again."
};
try {
return await generateResponse(products, context);
} catch (error) {
const lang = context.language || 'et';
return FALLBACK_RESPONSES[lang];
}
Cut-off Handling
if (finishReason === 'length') {
// Token limit reached
return response + '...'; // Graceful ellipsis
}
Testing
describe('Response Generation', () => {
it('generates Estonian response', async () => {
const products = mockProducts(3);
const context = { language: 'et', recipient: 'ema' };
const response = await generateResponse(products, context);
expect(response).toContain('soovitan');
expect(response.length).toBeGreaterThan(100);
});
it('mentions all products', async () => {
const products = mockProducts(3);
const response = await generateResponse(products, context);
products.forEach(p => {
expect(response.toLowerCase()).toContain(
p.name.toLowerCase()
);
});
});
it('respects token limit', async () => {
const response = await generateResponse(products, context);
const tokens = estimateTokens(response);
expect(tokens).toBeLessThan(2500);
});
});
Monitoring
{
generationTimeMs: number, // Should be ~2000ms
tokenCount: number, // Should be <2500
firstChunkMs: number, // Should be <150ms
repetitionDetected: boolean, // Should be <1%
cutoffOccurred: boolean, // Should be <5%
productsMentioned: number, // Should equal products.length
// Quality
responseLength: number,
languageMatched: boolean, // Response in correct language
formatValid: boolean // Proper structure
}
Cost Optimization
Strategies:
- Smart Prompts: Concise system prompts save tokens
- History Pruning: Only include last 3 messages
- Early Stopping: Stop at 200 tokens if sufficient
- Caching: System prompt caching (when available)
Token Breakdown:
System Prompt: ~400 tokens
Conversation History: ~300 tokens
User Prompt: ~150 tokens
Product Context: ~200 tokens
─────────────────────────
Input: ~1050 tokens
Generated Response: ~200 tokens
─────────────────────────
Total: ~1250 tokens @ $0.002 = $0.0025 per request
Related Documentation
- Phase 4: Diversity Selection - Previous phase
- Phase 0: Context Detection - First phase
- Pipeline Models - Complete overview