Skip to main content

Response Orchestration Guardrails

Quality enforcement in the response generation phase prevents hallucinations and ensures safe, high-quality AI responses.

Purpose

Response guardrails ensure:

  • Grounded generation based on validated products
  • Prompt hygiene with single source of truth
  • Streaming safety with proper event sequencing
  • Quality controls during generation

Product Grounding

Location: app/api/chat/orchestrators/response-orchestrator.ts:210-252

Product Validation & Compaction

Before sending to GPT-5.1:

function prepareProducts(products: Product[]): CompactedProduct[] {
return products.map(product => ({
id: product.id,
title: truncate(product.title, 150), // Limit title length
authors: product.authors || 'Unknown',
category: product.category,
price: product.price,
description: truncate(product.description, 300) // Limit description
}));
}

Why:

  • Reduce hallucination pressure: Less text = less to fabricate
  • Token efficiency: Saves input tokens
  • Focus: Only essential fields included

Product Context Injection

const productContext = compactedProducts.map((p, i) => `
${i + 1}. **${p.title}**
Author: ${p.authors}
Category: ${p.category}
Price: €${p.price}
`).join('\n');

const userPrompt = `
Here are the products to recommend:

${productContext}

User query: "${userMessage}"
Context: ${JSON.stringify(giftContext)}
`;

Benefit: GPT sees exact data to reference

Prompt Hygiene

Single Source of Truth

//  GOOD: One combined system message
const messages = [
{
role: 'system',
content: systemPrompt + constraints // Combined
},
...conversationHistory,
{
role: 'user',
content: userPrompt
}
];

// BAD: Multiple system messages
const messages = [
{ role: 'system', content: basePrompt },
{ role: 'system', content: constraints }, // Confusing!
{ role: 'system', content: formatting } // Redundant!
];

Why: Reduces conflicting instructions

Deterministic Preface

// Injected consistently
const PREFACE = {
et: 'Siin on minu soovitused:',
en: 'Here are my recommendations:'
};

// Ensures consistent opening
response = PREFACE[language] + '\n\n' + aiGenerated;

Benefit: Predictable response structure

Streaming Safety

SSE Event Sequencing

Key Guarantee: Product metadata sent BEFORE text generation starts

Tool-Output Events

// Separate data from narration
yield {
event: 'tool-output',
data: JSON.stringify({
toolName: 'product_search',
products: validatedProducts,
operation: 'replace'
})
};

// Later: text narration
yield {
event: 'text',
data: JSON.stringify({ content: aiText })
};

Benefit: Frontend can display products immediately, text streams independently

Quality Controls During Generation

Repetition Detection

function detectRepetition(text: string): boolean {
// Consecutive: "the the the"
const consecutivePattern = /(\b\w+\b)(\s+\1){2,}/g;
if (consecutivePattern.test(text)) return true;

// Frequent: same word 3+ times in 20-word window
const words = text.split(/\s+/).slice(-20);
const counts = new Map<string, number>();

words.forEach(word => {
counts.set(word, (counts.get(word) || 0) + 1);
});

return Array.from(counts.values()).some(count => count >= 3);
}

// Stop streaming if detected
if (detectRepetition(accumulatedText)) {
controller.abort();
return fallbackResponse();
}

Token Limit Monitoring

const MAX_TOKENS = 2500;
const WARNING_THRESHOLD = 0.9;

let tokenCount = 0;

for await (const chunk of stream) {
tokenCount += estimateTokens(chunk);

if (tokenCount >= MAX_TOKENS * WARNING_THRESHOLD) {
console.warn('⚠️ Approaching token limit:', {
used: tokenCount,
limit: MAX_TOKENS,
remaining: MAX_TOKENS - tokenCount
});
}

if (tokenCount >= MAX_TOKENS) {
break; // Stop streaming
}
}

Product Mention Validation

function validateProductMentions(
response: string,
products: Product[]
): ValidationResult {
const mentioned = products.filter(p =>
response.toLowerCase().includes(p.title.toLowerCase())
);

if (mentioned.length === 0) {
console.warn('⚠️ Response mentions no products!', {
responsePreview: response.slice(0, 100),
productCount: products.length
});

return {
valid: false,
reason: 'no-product-mentions'
};
}

if (mentioned.length < products.length) {
console.warn('⚠️ Not all products mentioned:', {
expected: products.length,
mentioned: mentioned.length
});
}

return { valid: true };
}

Delayed Card Injection

Configuration:

const CARD_INJECTION_THRESHOLD = 180;  // characters

let charCount = 0;
let cardsInjected = false;

for await (const chunk of stream) {
charCount += chunk.length;

// Inject cards after threshold
if (!cardsInjected && charCount >= CARD_INJECTION_THRESHOLD) {
yield {
event: 'tool-output',
data: JSON.stringify({ products })
};
cardsInjected = true;
}

yield {
event: 'text',
data: JSON.stringify({ content: chunk })
};
}

Why 180 characters:

  • User reads intro first (~2 seconds)
  • Products appear as context loads
  • Smooth UX transition
  • Avoids jarring flash

Error Handling

Generation Failures

try {
const response = await generateResponse(products, context);
return response;
} catch (error) {
console.error('Generation failed:', error);

// Fallback response
const fallback = context.language === 'et'
? 'Vabandust, ei suutnud vastust luua. Palun proovi uuesti.'
: 'Sorry, couldn\'t generate response. Please try again.';

return {
content: fallback,
error: true,
fallbackTriggered: true
};
}

Cut-off Handling

if (finishReason === 'length') {
// Token limit reached - graceful ellipsis
return {
content: response + '...',
truncated: true,
tokenCount: MAX_TOKENS
};
}

if (finishReason === 'stop') {
// Normal completion
return {
content: response,
truncated: false
};
}

Testing

describe('Response Guardrails', () => {
it('grounds response in product data', async () => {
const products = mockProducts(3);
const response = await generateResponse(products, context);

products.forEach(p => {
expect(response.toLowerCase()).toContain(p.title.toLowerCase());
});
});

it('detects repetition', () => {
const repetitive = 'great great great gift';

expect(detectRepetition(repetitive)).toBe(true);
});

it('respects token limit', async () => {
const response = await generateResponse(manyProducts, context);
const tokens = estimateTokens(response);

expect(tokens).toBeLessThan(2500);
});
});

Monitoring

{
// Safety metrics
repetitionDetected: number, // Should be &lt;1%
truncationRate: number, // Should be &lt;5%
productMentionRate: number, // Should be >98%

// Quality metrics
avgResponseLength: number,
avgTokens: number,
streamingErrors: number,

// Performance
generationTimeMs: number, // Should be ~2000ms
firstChunkMs: number // Should be &lt;150ms
}