Skip to main content

Resilience & Error Handling

The pipeline implements multiple layers of resilience to ensure a reliable user experience even when things go wrong.

Timeout Strategies

Request Timeout

Value: 30 seconds
Location: config.ts and pipeline.ts:287-319

Purpose: Catch requests that never receive a response

Behavior:

const requestTimeout = setTimeout(() => {
controller.abort();
setError({
type: 'timeout',
message: 'Request took too long to respond',
canRetry: true
});
}, 30000);

User Experience:

  • Error message displayed: "Request timeout. Please try again."
  • Retry button shown
  • Loading state cleared
  • Previous messages remain intact

Inactivity Timeout

Value: 45 seconds
Location: pipeline.ts:793-804

Purpose: Detect streams that stall mid-response

Behavior:

let lastEventTime = Date.now();

const inactivityTimeout = setTimeout(() => {
if (Date.now() - lastEventTime > 45000) {
controller.abort();
setError({
type: 'stalled',
message: 'Stream stalled. Connection lost.',
canRetry: true
});
}
}, 45000);

// Reset on each event
lastEventTime = Date.now();

User Experience:

  • Partial response kept if any text received
  • Error appended: "Connection interrupted. Retry?"
  • Retry preserves conversation context

Error Types & Recovery

1. Network Errors

Causes:

  • Loss of internet connection
  • Server unreachable
  • DNS failures

Detection:

try {
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify(payload)
});
} catch (networkError) {
// Handle network failure
}

Recovery:

  • Show network error message
  • Enable retry button
  • Preserve user's input
  • Don't persist failed message

2. Parse Errors

Causes:

  • Malformed JSON in SSE events
  • Invalid event format
  • Corrupted data

Detection:

try {
const data = JSON.parse(event.data);
} catch (parseError) {
console.error('Event parse failed:', {
event: event.type,
raw: event.data,
error: parseError.message
});
// Continue processing other events
}

Recovery:

  • Log error for debugging
  • Skip malformed event
  • Continue processing stream
  • Don't crash UI

3. Validation Errors

Causes:

  • Missing required fields
  • Invalid data types
  • Out-of-range values

Detection:

if (!isValidProduct(product)) {
console.warn('Invalid product:', product);
return; // Skip invalid item
}

Recovery:

  • Filter out invalid items
  • Show valid data
  • Log validation failures
  • Don't break entire response

4. Streaming Interruptions

Causes:

  • Backend crashes mid-response
  • Connection drops
  • Browser tab suspended

Detection:

  • Inactivity timeout triggers
  • Stream ends without done event
  • Unexpected EOF

Recovery:

  • Preserve partial response
  • Mark as incomplete
  • Show retry option
  • Explain what happened

UX Guards

Skeleton Control

Location: pipeline.ts:778-788 and config.ts:3-16

Logic:

const shouldShowSkeleton = 
toolsExpected &&
(accumulatedText.length >= 160 ||
Date.now() - firstEventTime >= 800);

if (shouldShowSkeleton && !showSkeleton) {
setShowSkeleton(true);
}

Purpose:

  • Don't show skeleton too early (looks glitchy)
  • Ensure skeleton appears if products are coming
  • Hide once real products arrive

Configuration:

MESSAGE_STREAMING_CONFIG = {
SKELETON_CHAR_THRESHOLD: 160, // characters
SKELETON_TIME_THRESHOLD_MS: 800 // milliseconds
}

Context Preservation

Location: pipeline.ts:488-540, 752-759, 988-1002

Strategy:

// Set from metadata
if (metadata.contextData) {
setContextData(metadata.contextData);
}

// Preserve through streaming
onNewMessage({
...assistantMessage,
contextData: contextData // Keep context
});

// Persist with final message
finalMessage.contextData = contextData;

Purpose:

  • Context pills always display correctly
  • "Show more" functionality works
  • Conversation history maintains context
  • Debugging has full information

Show-More Safety

Location: pipeline.ts:172-214, 574-640

Mechanism:

// Include in payload
payload.excludeProductIds = Array.from(shownProductIds);
payload.lastSearchParams = lastSearchParams;

// Update from metadata
if (metadata.searchParams) {
setLastSearchParams(metadata.searchParams);
}

// Propagate to next request
const newExcludes = [
...excludeProductIds,
...newProductIds
];
setShownProductIds(new Set(newExcludes));

Purpose:

  • Never show duplicate products
  • Maintain search continuity
  • Preserve user's exploration path
  • Handle pool exhaustion gracefully

Persistence Strategy

Location:

  • User: pipeline.ts:226-266
  • Assistant: index.ts:302-425 and pipeline.ts:902-1018

User Message Strategy:

// Fire-and-forget
persistUserMessage(userMessage).catch(error => {
console.error('User message save failed:', error);
// Don't block UI, log for retry
});

Assistant Message Strategy:

// Silent persistence
try {
await saveMessage(finalMessage, {
silent: true, // Don't overwrite UI
preserveContext: true
});
} catch (error) {
console.error('Assistant save failed:', error);
// Schedule retry
scheduleRetry(finalMessage);
}

Purpose:

  • User message saved ASAP
  • Assistant saved only when complete
  • Don't lose context/metrics
  • Retry silently on failure

Error Messages

User-Facing Messages

const ERROR_MESSAGES = {
NETWORK: 'Connection lost. Check your internet and try again.',
TIMEOUT: 'Request took too long. The server might be busy.',
STALLED: 'Stream interrupted. Your partial response is saved.',
SERVER_ERROR: 'Something went wrong on our end. We\'re working on it.',
PARSE_ERROR: 'Received invalid data. Please retry your request.',
VALIDATION_ERROR: 'Some results couldn\'t be displayed. Try refining your search.'
};

Developer Messages

// Logged to console with full context
console.error('Pipeline error:', {
type: 'timeout',
stage: 'streaming',
duration: elapsed,
partialData: {
textLength: text.length,
productsReceived: products.length,
eventsProcessed: eventCount
},
recovery: 'user-retry-available'
});

Monitoring Integration

Metrics Captured

{
// Success metrics
ttfcMs: number,
totalDurationMs: number,
eventsReceived: number,
productsDelivered: number,

// Error metrics
timeouts: number,
parseErrors: number,
networkErrors: number,
retries: number,

// Recovery metrics
partialSuccesses: number,
retrySuccessRate: number
}

Alert Thresholds

ALERT_THRESHOLDS = {
TIMEOUT_RATE: 0.05, // >5% timeouts → alert
ERROR_RATE: 0.10, // >10% errors → alert
RETRY_RATE: 0.15, // >15% retries → alert
TTFC_P95: 2000, // >2s TTFC → alert
PARSE_ERROR_RATE: 0.01 // >1% parse errors → alert
}

Testing Resilience

Timeout Testing

// Simulate slow backend
test('handles request timeout gracefully', async () => {
const mockSlowResponse = delay(35000); // >30s

const result = await streamMessage(input);

expect(result.error).toBe('timeout');
expect(result.canRetry).toBe(true);
});

Network Failure Testing

// Simulate connection loss
test('recovers from network interruption', async () => {
const mockNetworkError = new Error('Failed to fetch');

const result = await streamMessage(input);

expect(result.error).toBe('network');
expect(result.partialData).toBeDefined();
});

Parse Error Testing

// Malformed JSON
test('continues despite parse errors', async () => {
const mockEvents = [
{ type: 'text', data: '"valid"' },
{ type: 'metadata', data: '{invalid json}' },
{ type: 'text', data: '"also valid"' }
];

const result = await streamMessage(input, mockEvents);

expect(result.text).toContain('valid also valid');
expect(result.parseErrors).toBe(1);
});

Best Practices

1. Always Provide Retry

Every error should offer a retry option unless it's a validation error.

2. Preserve Partial Success

If user got any value, keep it and allow continuation.

3. Log Everything

Rich error context helps diagnose production issues.

4. Timeout Early

Better to timeout and retry than hang indefinitely.

5. Fail Gracefully

UI should never crash due to backend issues.