Response Orchestration Guardrails

Quality enforcement in the response generation phase prevents hallucinations and ensures safe, high-quality AI responses.

Purpose

Response guardrails ensure:

Grounded generation based on validated products
Prompt hygiene with single source of truth
Streaming safety with proper event sequencing
Quality controls during generation

Product Grounding

Location: app/api/chat/orchestrators/response-orchestrator.ts:210-252

Product Validation & Compaction

Before sending to GPT-5.1:

function prepareProducts(products: Product[]): CompactedProduct[] {
  return products.map(product => ({
    id: product.id,
    title: truncate(product.title, 150),      // Limit title length
    authors: product.authors || 'Unknown',
    category: product.category,
    price: product.price,
    description: truncate(product.description, 300)  // Limit description
  }));
}

Why:

Reduce hallucination pressure: Less text = less to fabricate
Token efficiency: Saves input tokens
Focus: Only essential fields included

Product Context Injection

const productContext = compactedProducts.map((p, i) => `
${i + 1}. **${p.title}**
   Author: ${p.authors}
   Category: ${p.category}
   Price: €${p.price}
`).join('\n');

const userPrompt = `
Here are the products to recommend:

${productContext}

User query: "${userMessage}"
Context: ${JSON.stringify(giftContext)}
`;

Benefit: GPT sees exact data to reference

Prompt Hygiene

Single Source of Truth

//  GOOD: One combined system message
const messages = [
  {
    role: 'system',
    content: systemPrompt + constraints  // Combined
  },
  ...conversationHistory,
  {
    role: 'user',
    content: userPrompt
  }
];

//  BAD: Multiple system messages
const messages = [
  { role: 'system', content: basePrompt },
  { role: 'system', content: constraints },  // Confusing!
  { role: 'system', content: formatting }    // Redundant!
];

Why: Reduces conflicting instructions

Deterministic Preface

// Injected consistently
const PREFACE = {
  et: 'Siin on minu soovitused:',
  en: 'Here are my recommendations:'
};

// Ensures consistent opening
response = PREFACE[language] + '\n\n' + aiGenerated;

Benefit: Predictable response structure

Streaming Safety

SSE Event Sequencing

Key Guarantee: Product metadata sent BEFORE text generation starts

Tool-Output Events

// Separate data from narration
yield {
  event: 'tool-output',
  data: JSON.stringify({
    toolName: 'product_search',
    products: validatedProducts,
    operation: 'replace'
  })
};

// Later: text narration
yield {
  event: 'text',
  data: JSON.stringify({ content: aiText })
};

Benefit: Frontend can display products immediately, text streams independently

Quality Controls During Generation

Repetition Detection

function detectRepetition(text: string): boolean {
  // Consecutive: "the the the"
  const consecutivePattern = /(\b\w+\b)(\s+\1){2,}/g;
  if (consecutivePattern.test(text)) return true;
  
  // Frequent: same word 3+ times in 20-word window
  const words = text.split(/\s+/).slice(-20);
  const counts = new Map<string, number>();
  
  words.forEach(word => {
    counts.set(word, (counts.get(word) || 0) + 1);
  });
  
  return Array.from(counts.values()).some(count => count >= 3);
}

// Stop streaming if detected
if (detectRepetition(accumulatedText)) {
  controller.abort();
  return fallbackResponse();
}

Token Limit Monitoring

const MAX_TOKENS = 2500;
const WARNING_THRESHOLD = 0.9;

let tokenCount = 0;

for await (const chunk of stream) {
  tokenCount += estimateTokens(chunk);
  
  if (tokenCount >= MAX_TOKENS * WARNING_THRESHOLD) {
    console.warn('⚠️ Approaching token limit:', {
      used: tokenCount,
      limit: MAX_TOKENS,
      remaining: MAX_TOKENS - tokenCount
    });
  }
  
  if (tokenCount >= MAX_TOKENS) {
    break;  // Stop streaming
  }
}

Product Mention Validation

function validateProductMentions(
  response: string,
  products: Product[]
): ValidationResult {
  const mentioned = products.filter(p =>
    response.toLowerCase().includes(p.title.toLowerCase())
  );
  
  if (mentioned.length === 0) {
    console.warn('⚠️ Response mentions no products!', {
      responsePreview: response.slice(0, 100),
      productCount: products.length
    });
    
    return {
      valid: false,
      reason: 'no-product-mentions'
    };
  }
  
  if (mentioned.length < products.length) {
    console.warn('⚠️ Not all products mentioned:', {
      expected: products.length,
      mentioned: mentioned.length
    });
  }
  
  return { valid: true };
}

Delayed Card Injection

Configuration:

const CARD_INJECTION_THRESHOLD = 180;  // characters

let charCount = 0;
let cardsInjected = false;

for await (const chunk of stream) {
  charCount += chunk.length;
  
  // Inject cards after threshold
  if (!cardsInjected && charCount >= CARD_INJECTION_THRESHOLD) {
    yield {
      event: 'tool-output',
      data: JSON.stringify({ products })
    };
    cardsInjected = true;
  }
  
  yield {
    event: 'text',
    data: JSON.stringify({ content: chunk })
  };
}

Why 180 characters:

User reads intro first (~2 seconds)
Products appear as context loads
Smooth UX transition
Avoids jarring flash

Error Handling

Generation Failures

try {
  const response = await generateResponse(products, context);
  return response;
} catch (error) {
  console.error('Generation failed:', error);
  
  // Fallback response
  const fallback = context.language === 'et'
    ? 'Vabandust, ei suutnud vastust luua. Palun proovi uuesti.'
    : 'Sorry, couldn\'t generate response. Please try again.';
  
  return {
    content: fallback,
    error: true,
    fallbackTriggered: true
  };
}

Cut-off Handling

if (finishReason === 'length') {
  // Token limit reached - graceful ellipsis
  return {
    content: response + '...',
    truncated: true,
    tokenCount: MAX_TOKENS
  };
}

if (finishReason === 'stop') {
  // Normal completion
  return {
    content: response,
    truncated: false
  };
}

Testing

describe('Response Guardrails', () => {
  it('grounds response in product data', async () => {
    const products = mockProducts(3);
    const response = await generateResponse(products, context);
    
    products.forEach(p => {
      expect(response.toLowerCase()).toContain(p.title.toLowerCase());
    });
  });
  
  it('detects repetition', () => {
    const repetitive = 'great great great gift';
    
    expect(detectRepetition(repetitive)).toBe(true);
  });
  
  it('respects token limit', async () => {
    const response = await generateResponse(manyProducts, context);
    const tokens = estimateTokens(response);
    
    expect(tokens).toBeLessThan(2500);
  });
});

Monitoring

{
  // Safety metrics
  repetitionDetected: number,     // Should be &lt;1%
  truncationRate: number,         // Should be &lt;5%
  productMentionRate: number,     // Should be >98%
  
  // Quality metrics
  avgResponseLength: number,
  avgTokens: number,
  streamingErrors: number,
  
  // Performance
  generationTimeMs: number,       // Should be ~2000ms
  firstChunkMs: number           // Should be &lt;150ms
}

Search Guardrails - Previous phase
Validation Guardrails - Next phase
Response Generation - GPT-5.1 details

Purpose​

Product Grounding​

Product Validation & Compaction​

Product Context Injection​

Prompt Hygiene​

Single Source of Truth​

Deterministic Preface​

Streaming Safety​

SSE Event Sequencing​

Tool-Output Events​

Quality Controls During Generation​

Repetition Detection​

Token Limit Monitoring​

Product Mention Validation​

Delayed Card Injection​

Error Handling​

Generation Failures​

Cut-off Handling​

Testing​

Monitoring​

Related Documentation​