Response Orchestration Guardrails
Quality enforcement in the response generation phase prevents hallucinations and ensures safe, high-quality AI responses.
Purpose
Response guardrails ensure:
- Grounded generation based on validated products
- Prompt hygiene with single source of truth
- Streaming safety with proper event sequencing
- Quality controls during generation
Product Grounding
Location: app/api/chat/orchestrators/response-orchestrator.ts:210-252
Product Validation & Compaction
Before sending to GPT-5.1:
function prepareProducts(products: Product[]): CompactedProduct[] {
return products.map(product => ({
id: product.id,
title: truncate(product.title, 150), // Limit title length
authors: product.authors || 'Unknown',
category: product.category,
price: product.price,
description: truncate(product.description, 300) // Limit description
}));
}
Why:
- Reduce hallucination pressure: Less text = less to fabricate
- Token efficiency: Saves input tokens
- Focus: Only essential fields included
Product Context Injection
const productContext = compactedProducts.map((p, i) => `
${i + 1}. **${p.title}**
Author: ${p.authors}
Category: ${p.category}
Price: €${p.price}
`).join('\n');
const userPrompt = `
Here are the products to recommend:
${productContext}
User query: "${userMessage}"
Context: ${JSON.stringify(giftContext)}
`;
Benefit: GPT sees exact data to reference
Prompt Hygiene
Single Source of Truth
// GOOD: One combined system message
const messages = [
{
role: 'system',
content: systemPrompt + constraints // Combined
},
...conversationHistory,
{
role: 'user',
content: userPrompt
}
];
// BAD: Multiple system messages
const messages = [
{ role: 'system', content: basePrompt },
{ role: 'system', content: constraints }, // Confusing!
{ role: 'system', content: formatting } // Redundant!
];
Why: Reduces conflicting instructions
Deterministic Preface
// Injected consistently
const PREFACE = {
et: 'Siin on minu soovitused:',
en: 'Here are my recommendations:'
};
// Ensures consistent opening
response = PREFACE[language] + '\n\n' + aiGenerated;
Benefit: Predictable response structure
Streaming Safety
SSE Event Sequencing
Key Guarantee: Product metadata sent BEFORE text generation starts
Tool-Output Events
// Separate data from narration
yield {
event: 'tool-output',
data: JSON.stringify({
toolName: 'product_search',
products: validatedProducts,
operation: 'replace'
})
};
// Later: text narration
yield {
event: 'text',
data: JSON.stringify({ content: aiText })
};
Benefit: Frontend can display products immediately, text streams independently
Quality Controls During Generation
Repetition Detection
function detectRepetition(text: string): boolean {
// Consecutive: "the the the"
const consecutivePattern = /(\b\w+\b)(\s+\1){2,}/g;
if (consecutivePattern.test(text)) return true;
// Frequent: same word 3+ times in 20-word window
const words = text.split(/\s+/).slice(-20);
const counts = new Map<string, number>();
words.forEach(word => {
counts.set(word, (counts.get(word) || 0) + 1);
});
return Array.from(counts.values()).some(count => count >= 3);
}
// Stop streaming if detected
if (detectRepetition(accumulatedText)) {
controller.abort();
return fallbackResponse();
}
Token Limit Monitoring
const MAX_TOKENS = 2500;
const WARNING_THRESHOLD = 0.9;
let tokenCount = 0;
for await (const chunk of stream) {
tokenCount += estimateTokens(chunk);
if (tokenCount >= MAX_TOKENS * WARNING_THRESHOLD) {
console.warn('⚠️ Approaching token limit:', {
used: tokenCount,
limit: MAX_TOKENS,
remaining: MAX_TOKENS - tokenCount
});
}
if (tokenCount >= MAX_TOKENS) {
break; // Stop streaming
}
}
Product Mention Validation
function validateProductMentions(
response: string,
products: Product[]
): ValidationResult {
const mentioned = products.filter(p =>
response.toLowerCase().includes(p.title.toLowerCase())
);
if (mentioned.length === 0) {
console.warn('⚠️ Response mentions no products!', {
responsePreview: response.slice(0, 100),
productCount: products.length
});
return {
valid: false,
reason: 'no-product-mentions'
};
}
if (mentioned.length < products.length) {
console.warn('⚠️ Not all products mentioned:', {
expected: products.length,
mentioned: mentioned.length
});
}
return { valid: true };
}
Delayed Card Injection
Configuration:
const CARD_INJECTION_THRESHOLD = 180; // characters
let charCount = 0;
let cardsInjected = false;
for await (const chunk of stream) {
charCount += chunk.length;
// Inject cards after threshold
if (!cardsInjected && charCount >= CARD_INJECTION_THRESHOLD) {
yield {
event: 'tool-output',
data: JSON.stringify({ products })
};
cardsInjected = true;
}
yield {
event: 'text',
data: JSON.stringify({ content: chunk })
};
}
Why 180 characters:
- User reads intro first (~2 seconds)
- Products appear as context loads
- Smooth UX transition
- Avoids jarring flash
Error Handling
Generation Failures
try {
const response = await generateResponse(products, context);
return response;
} catch (error) {
console.error('Generation failed:', error);
// Fallback response
const fallback = context.language === 'et'
? 'Vabandust, ei suutnud vastust luua. Palun proovi uuesti.'
: 'Sorry, couldn\'t generate response. Please try again.';
return {
content: fallback,
error: true,
fallbackTriggered: true
};
}
Cut-off Handling
if (finishReason === 'length') {
// Token limit reached - graceful ellipsis
return {
content: response + '...',
truncated: true,
tokenCount: MAX_TOKENS
};
}
if (finishReason === 'stop') {
// Normal completion
return {
content: response,
truncated: false
};
}
Testing
describe('Response Guardrails', () => {
it('grounds response in product data', async () => {
const products = mockProducts(3);
const response = await generateResponse(products, context);
products.forEach(p => {
expect(response.toLowerCase()).toContain(p.title.toLowerCase());
});
});
it('detects repetition', () => {
const repetitive = 'great great great gift';
expect(detectRepetition(repetitive)).toBe(true);
});
it('respects token limit', async () => {
const response = await generateResponse(manyProducts, context);
const tokens = estimateTokens(response);
expect(tokens).toBeLessThan(2500);
});
});
Monitoring
{
// Safety metrics
repetitionDetected: number, // Should be <1%
truncationRate: number, // Should be <5%
productMentionRate: number, // Should be >98%
// Quality metrics
avgResponseLength: number,
avgTokens: number,
streamingErrors: number,
// Performance
generationTimeMs: number, // Should be ~2000ms
firstChunkMs: number // Should be <150ms
}
Related Documentation
- Search Guardrails - Previous phase
- Validation Guardrails - Next phase
- Response Generation - GPT-5.1 details