Implementation Plan: Translation Pipeline Robustness and Regression Testing

Post-mortem analysis of session d1d142a6 (February 12, 2026) revealed four critical bugs and several systemic robustness gaps in the translation pipeline. This document provides a detailed implementation roadmap for fixes, hardening, and regression testing to prevent recurrence.

Executive Summary

Priority	Issue	Impact	Fix ETA
P0	Context overflow RangeError	Production-breaking (172% utilization)	1 day
P1	Task completion tracking gap	Quality metric failure (0.83 score)	2 days
P2	Webscraping agent rate limit	Research pipeline failure (29s runtime)	3 days
P3	Incomplete Instagram scraping	Data quality degradation (2 of 3)	2 days

Total estimated effort: 8 developer-days + 2 days for regression suite

A. Bug Fixes

1. Context Overflow Bug (P0)

Trace ID: eba9ce1a66679a232f44df4566c7d25f (line 99, traces-2026-02-12.jsonl)

What

getUtilizationBar() function crashes with RangeError: Invalid count value: -14 when context utilization exceeds 100%. Session d2c7e927 (translation session) reached 172% utilization (343,957 tokens in 200K window), causing negative bar segment calculation.

Where

File: ~/.claude/hooks/dist/handlers/session-start.js
Function: getUtilizationBar()
Line: 163
Stack trace: String.repeat() called with negative count (-14)

Root Cause

The utilization bar assumes 20-character width. When utilization exceeds 100%, the “filled” portion exceeds 20 characters, leaving a negative count for the “empty” portion:

// Broken calculation example (172% utilization):
const filled = Math.floor(0.2 * 172);  // 34 characters
const empty = 20 - filled;              // -14 characters (INVALID)
const bar = '█'.repeat(filled) + '░'.repeat(empty);  // RangeError!

Fix

Add clamping to ensure utilization stays within 0-100% range:

// session-start.js line ~160
function getUtilizationBar(utilizationPercent) {
  // Clamp to 0-100% range
  const clampedPercent = Math.max(0, Math.min(100, utilizationPercent));
  
  const barWidth = 20;
  const filled = Math.floor(barWidth * (clampedPercent / 100));
  const empty = barWidth - filled;
  
  const bar = '█'.repeat(filled) + '░'.repeat(empty);
  
  // Log overflow events for telemetry
  if (utilizationPercent > 100) {
    console.error(`[OVERFLOW] Context utilization exceeded 100%: ${utilizationPercent}%`);
    // TODO: emit OpenTelemetry span event
  }
  
  return bar;
}

Additional safeguard: Add overflow detection before context estimation:

// session-start.js line ~40 (context estimation section)
const estimatedTokens = calculateContextTokens(transcript);
const utilizationPercent = (estimatedTokens / contextWindowSize) * 100;

if (utilizationPercent > 100) {
  span.setStatus({ code: SpanStatusCode.ERROR, message: 'Context overflow detected' });
  span.recordException(new Error(`Context overflow: ${estimatedTokens} / ${contextWindowSize} tokens`));
}

Regression Test

// File: ~/.claude/hooks/src/handlers/__tests__/session-start.test.ts

describe('getUtilizationBar', () => {
  it('handles 0% utilization', () => {
    const bar = getUtilizationBar(0);
    expect(bar).toBe('░░░░░░░░░░░░░░░░░░░░');  // 20 empty chars
  });
  
  it('handles 50% utilization', () => {
    const bar = getUtilizationBar(50);
    expect(bar).toBe('██████████░░░░░░░░░░');  // 10 filled, 10 empty
  });
  
  it('handles 100% utilization', () => {
    const bar = getUtilizationBar(100);
    expect(bar).toBe('████████████████████');  // 20 filled
  });
  
  it('clamps > 100% utilization to 100%', () => {
    const bar = getUtilizationBar(172);  // Real overflow from session d2c7e927
    expect(bar).toBe('████████████████████');  // Still 20 filled (clamped)
    expect(bar).toHaveLength(20);
  });
  
  it('clamps negative utilization to 0%', () => {
    const bar = getUtilizationBar(-10);
    expect(bar).toBe('░░░░░░░░░░░░░░░░░░░░');
  });
  
  it('does not throw RangeError for any utilization value', () => {
    const testCases = [-100, -1, 0, 0.5, 50, 99.9, 100, 150, 172, 1000];
    testCases.forEach(utilization => {
      expect(() => getUtilizationBar(utilization)).not.toThrow();
    });
  });
  
  it('logs overflow events for > 100% utilization', () => {
    const consoleErrorSpy = jest.spyOn(console, 'error').mockImplementation();
    getUtilizationBar(172);
    expect(consoleErrorSpy).toHaveBeenCalledWith(
      expect.stringContaining('Context utilization exceeded 100%: 172%')
    );
    consoleErrorSpy.mockRestore();
  });
});

Verification steps:

Run unit tests: npm test -- session-start.test.ts
Manually trigger overflow scenario: create session with >200K tokens pre-loaded
Check telemetry for overflow span events: grep "Context overflow" ~/.claude/telemetry/traces-*.jsonl

2. Task Completion Tracking Gap (P1)

Evidence: Task completion score 0.83 (5 TaskUpdates per 3 TaskCreates)

What

The session created more subtasks than it closed, resulting in a task completion ratio below the 0.85 warning threshold. This indicates either:

Tasks were created but never marked complete
Context compaction dropped task state
Tasks were implicitly closed without proper telemetry

From post-mortem:

TaskCreate: 10 calls
TaskUpdate: 16 calls
TaskCreate/TaskUpdate ratio suggests incomplete resolution

Where

Locations to investigate:
Task state persistence: ~/.claude/hooks/dist/lib/context-tracker.js
Context compaction logic: Claude Code internal (vendor code)
Task auto-close logic: hooks/dist/handlers/post-tool.js

Root Cause (Hypothesis)

Context compaction at 9:03 PM reset message count from 42 to 6, compressing away task state. The telemetry shows:

Pre-compaction: 42 messages, 118,542 tokens
Post-compaction: 6 messages, 93,486 tokens
261 output tokens post-compaction (vs 1,752 pre-compaction) indicates session was “dead”

Task state may be stored in message history, so compaction could orphan unclosed tasks.

Fix

Option 1: Task state serialization (recommended)

Persist task state to disk, independent of context window:

// File: ~/.claude/hooks/dist/lib/task-tracker.js (new file)

import { writeFileSync, readFileSync, existsSync } from 'fs';
import { join } from 'path';

const TASK_STATE_DIR = join(process.env.HOME, '.claude', 'task-state');

export function saveTaskState(sessionId, tasks) {
  const filePath = join(TASK_STATE_DIR, `${sessionId}.json`);
  writeFileSync(filePath, JSON.stringify({
    sessionId,
    timestamp: Date.now(),
    tasks
  }, null, 2));
}

export function loadTaskState(sessionId) {
  const filePath = join(TASK_STATE_DIR, `${sessionId}.json`);
  if (!existsSync(filePath)) return null;
  return JSON.parse(readFileSync(filePath, 'utf-8'));
}

export function calculateTaskCompletion(tasks) {
  const total = tasks.length;
  const completed = tasks.filter(t => t.status === 'completed').length;
  return completed / total;
}

Hook into context compaction event (if exposed) or session-start hook for resume:

// File: ~/.claude/hooks/dist/handlers/session-start.js

import { loadTaskState, calculateTaskCompletion } from '../lib/task-tracker.js';

// In session start handler (after line ~80):
const taskState = loadTaskState(sessionId);
if (taskState && isResume) {
  const completionRatio = calculateTaskCompletion(taskState.tasks);
  console.log(`[TASK-STATE] Restored ${taskState.tasks.length} tasks, completion: ${completionRatio.toFixed(2)}`);
  
  if (completionRatio < 0.85) {
    console.warn(`[TASK-STATE] Low completion ratio: ${completionRatio}`);
  }
}

Option 2: Auto-close on Write tool

When a Write tool completes successfully, auto-close the associated task:

// File: ~/.claude/hooks/dist/handlers/post-tool.js

// In post-tool handler (after Write tool success):
if (toolName === 'Write' && success) {
  // Infer task from file path
  const taskName = inferTaskFromFilePath(toolParams.file_path);
  if (taskName) {
    console.log(`[AUTO-CLOSE] Closing task "${taskName}" after Write success`);
    // TODO: emit TaskUpdate with status="completed"
  }
}

Regression Test

// File: ~/.claude/hooks/src/lib/__tests__/task-tracker.test.ts

describe('Task state persistence', () => {
  it('saves task state to disk', () => {
    const sessionId = 'test-session-123';
    const tasks = [
      { id: '1', name: 'Task 1', status: 'in-progress' },
      { id: '2', name: 'Task 2', status: 'completed' }
    ];
    
    saveTaskState(sessionId, tasks);
    
    const loaded = loadTaskState(sessionId);
    expect(loaded.tasks).toEqual(tasks);
  });
  
  it('calculates task completion ratio', () => {
    const tasks = [
      { id: '1', status: 'completed' },
      { id: '2', status: 'completed' },
      { id: '3', status: 'in-progress' }
    ];
    
    const ratio = calculateTaskCompletion(tasks);
    expect(ratio).toBeCloseTo(0.67, 2);  // 2 of 3 completed
  });
  
  it('returns 0.83 for session d1d142a6 scenario', () => {
    // 5 TaskUpdates, 3 TaskCreates (from post-mortem)
    const tasks = [
      { id: '1', status: 'completed' },
      { id: '2', status: 'completed' },
      { id: '3', status: 'completed' },
      { id: '4', status: 'in-progress' },
      { id: '5', status: 'in-progress' }
    ];
    
    const ratio = calculateTaskCompletion(tasks);
    expect(ratio).toBeCloseTo(0.60, 2);  // Adjusted to match 3 completed / 5 total
  });
});

Verification steps:

Run translation workflow with task state logging enabled
Trigger context compaction at 60% utilization
Verify task state persists post-compaction: ls ~/.claude/task-state/
Check task completion metric: grep "task_completion" ~/.claude/telemetry/evaluations-*.jsonl

3. Webscraping Agent Rate Limit Failure (P2)

Evidence: Agent terminated after 29 seconds, 4 tool uses (from post-mortem, line 128)

What

Background webscraping-research-analyst agent launched to research ZoukMX growth strategy hit an external API rate limit after 29 seconds and terminated without retry or fallback.

From post-mortem:

Rate limiting after 29 seconds indicates:

No rate limit handling or backoff logic

No fallback data sources

No graceful degradation

Where

Agent invocation: session d1d142a6, ~8:20-8:30 PM CT
Tool: Task (agent type: webscraping-research-analyst)
Failure mode: External API 429 response → immediate termination

Root Cause

Agent does not implement:

Rate limit detection (429 status code handling)
Exponential backoff retry logic
Fallback data sources
Error escalation to parent session

Fix

Step 1: Detect rate limits

// File: ~/.claude/hooks/dist/handlers/agent-error-handler.js (new file)

export function isRateLimitError(error) {
  // Check for HTTP 429 or common rate limit messages
  return error.statusCode === 429 ||
         error.message?.includes('rate limit') ||
         error.message?.includes('too many requests');
}

export function getRetryAfter(error) {
  // Parse Retry-After header if present
  if (error.headers?.['retry-after']) {
    return parseInt(error.headers['retry-after'], 10) * 1000;  // Convert to ms
  }
  return null;  // Use exponential backoff
}

Step 2: Implement exponential backoff

// File: ~/.claude/hooks/dist/lib/exponential-backoff.js (new file)

export async function retryWithBackoff(fn, options = {}) {
  const {
    maxRetries = 3,
    initialDelay = 1000,  // 1 second
    maxDelay = 30000,     // 30 seconds
    factor = 2,
    onRetry = null
  } = options;
  
  let lastError;
  
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error;
      
      if (attempt === maxRetries - 1) {
        throw error;  // Final attempt failed
      }
      
      const delay = Math.min(initialDelay * Math.pow(factor, attempt), maxDelay);
      
      if (onRetry) {
        onRetry(attempt + 1, delay, error);
      }
      
      console.log(`[RETRY] Attempt ${attempt + 1}/${maxRetries} failed, retrying in ${delay}ms...`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  
  throw lastError;
}

Step 3: Integrate into agent tool calls

// File: ~/.claude/hooks/dist/handlers/pre-tool.js (agent section)

import { isRateLimitError, getRetryAfter } from './agent-error-handler.js';
import { retryWithBackoff } from '../lib/exponential-backoff.js';

// Wrap agent tool calls with retry logic:
async function executeAgentTool(toolName, params) {
  return await retryWithBackoff(
    async () => {
      return await callAgentTool(toolName, params);
    },
    {
      maxRetries: 3,
      initialDelay: 1000,
      maxDelay: 30000,
      onRetry: (attempt, delay, error) => {
        if (isRateLimitError(error)) {
          const retryAfter = getRetryAfter(error) || delay;
          console.log(`[RATE-LIMIT] Attempt ${attempt}, retry after ${retryAfter}ms`);
        }
      }
    }
  );
}

Step 4: Escalate failures to parent session

// In agent execution wrapper:
try {
  const result = await executeAgentTool(toolName, params);
  return result;
} catch (error) {
  if (isRateLimitError(error)) {
    // Surface to user
    console.error(`[AGENT-FAILURE] Rate limit exceeded after retries: ${error.message}`);
    // TODO: emit user notification via Claude Code API
  }
  throw error;
}

Regression Test

// File: ~/.claude/hooks/src/lib/__tests__/exponential-backoff.test.ts

describe('Exponential backoff', () => {
  it('succeeds on first attempt', async () => {
    const fn = jest.fn().mockResolvedValue('success');
    const result = await retryWithBackoff(fn);
    
    expect(result).toBe('success');
    expect(fn).toHaveBeenCalledTimes(1);
  });
  
  it('retries on failure', async () => {
    const fn = jest.fn()
      .mockRejectedValueOnce(new Error('Fail 1'))
      .mockRejectedValueOnce(new Error('Fail 2'))
      .mockResolvedValue('success');
    
    const result = await retryWithBackoff(fn, { maxRetries: 3 });
    
    expect(result).toBe('success');
    expect(fn).toHaveBeenCalledTimes(3);
  });
  
  it('throws after max retries', async () => {
    const fn = jest.fn().mockRejectedValue(new Error('Always fails'));
    
    await expect(retryWithBackoff(fn, { maxRetries: 2 }))
      .rejects.toThrow('Always fails');
    
    expect(fn).toHaveBeenCalledTimes(2);
  });
  
  it('respects Retry-After header for rate limits', async () => {
    const error = new Error('Rate limit');
    error.statusCode = 429;
    error.headers = { 'retry-after': '5' };  // 5 seconds
    
    const retryAfter = getRetryAfter(error);
    expect(retryAfter).toBe(5000);  // Converted to ms
  });
  
  it('uses exponential backoff delays', async () => {
    const fn = jest.fn().mockRejectedValue(new Error('Fail'));
    const delays = [];
    
    try {
      await retryWithBackoff(fn, {
        maxRetries: 3,
        initialDelay: 100,
        factor: 2,
        onRetry: (attempt, delay) => delays.push(delay)
      });
    } catch (e) {}
    
    expect(delays).toEqual([100, 200, 400]);
  });
});

Verification steps:

Mock 429 response in agent test harness
Verify retry attempts: grep "RETRY" ~/.claude/telemetry/logs-*.jsonl
Confirm user notification on final failure
Test with real webscraping agent: launch 10 concurrent requests to trigger rate limit

4. Incomplete Instagram Scraping (P3)

Evidence: Only 2 visit_page calls for 3 Instagram accounts (from post-mortem, line 124)

What

Session mentioned three Instagram accounts (@edghar.e.nadyne, @dance.edghar, @nadyne.cruz) as voice reference material, but telemetry shows only 2 MCP visit_page tool invocations. The third account was either:

Skipped intentionally
Failed silently
Omitted from scraping plan

The Artist Profile translation had elevated hallucination scores (0.05 vs 0.02 for Austin Market), suggesting incomplete voice reference data.

Where

Session: d1d142a6 (translation session, Feb 12)
Tool: MCP visit_page (instagram-mcp-server)
Expected calls: 3
Actual calls: 2

Root Cause (Hypothesis)

Silent failure: The third account scrape failed (rate limit, private account, network error) but no error was logged
Incomplete scraping plan: Agent only planned to scrape 2 accounts
Truncated results: Context window pressure caused early termination

Fix

Step 1: Add scraping validation

// File: ~/.claude/hooks/dist/handlers/post-tool.js (MCP section)

const EXPECTED_INSTAGRAM_ACCOUNTS = [
  '@edghar.e.nadyne',
  '@dance.edghar',
  '@nadyne.cruz'
];

let scrapedAccounts = [];

// In MCP post-tool handler:
if (mcpServer === 'instagram' && mcpTool === 'visit_page') {
  const account = extractAccountFromParams(toolParams);
  scrapedAccounts.push(account);
  
  console.log(`[INSTAGRAM] Scraped ${scrapedAccounts.length}/${EXPECTED_INSTAGRAM_ACCOUNTS.length}: ${account}`);
  
  // Check if all accounts scraped
  if (scrapedAccounts.length === EXPECTED_INSTAGRAM_ACCOUNTS.length) {
    console.log('[INSTAGRAM] All accounts scraped successfully');
  }
}

// At session end (stop hook):
if (scrapedAccounts.length < EXPECTED_INSTAGRAM_ACCOUNTS.length) {
  const missing = EXPECTED_INSTAGRAM_ACCOUNTS.filter(a => !scrapedAccounts.includes(a));
  console.warn(`[INSTAGRAM] Incomplete scraping: missing ${missing.join(', ')}`);
  // TODO: emit telemetry warning
}

Step 2: Surface scraping errors

// In MCP error handling:
if (mcpServer === 'instagram' && !success) {
  const account = extractAccountFromParams(toolParams);
  console.error(`[INSTAGRAM] Failed to scrape ${account}: ${errorMessage}`);
  
  // Attempt retry for transient errors
  if (isTransientError(error)) {
    console.log(`[INSTAGRAM] Retrying ${account}...`);
    // TODO: retry logic
  }
}

Step 3: Pre-scraping validation

Before translation starts, validate all accounts are accessible:

// File: ~/.claude/hooks/dist/handlers/pre-translation.js (new hook)

export async function validateVoiceReferences(accounts) {
  const results = [];
  
  for (const account of accounts) {
    try {
      const profile = await checkAccountAccessible(account);
      results.push({ account, accessible: true, profile });
    } catch (error) {
      results.push({ account, accessible: false, error: error.message });
    }
  }
  
  const inaccessible = results.filter(r => !r.accessible);
  if (inaccessible.length > 0) {
    console.warn(`[VOICE-REF] Inaccessible accounts: ${inaccessible.map(r => r.account).join(', ')}`);
  }
  
  return results;
}

Regression Test

// File: ~/.claude/hooks/src/handlers/__tests__/instagram-scraping.test.ts

describe('Instagram scraping validation', () => {
  it('tracks all scraped accounts', () => {
    const accounts = ['@account1', '@account2', '@account3'];
    
    // Simulate 3 successful scrapes
    accounts.forEach(account => {
      handleInstagramScrape(account, { success: true });
    });
    
    const status = getScrapingStatus();
    expect(status.scraped).toBe(3);
    expect(status.expected).toBe(3);
    expect(status.complete).toBe(true);
  });
  
  it('warns on incomplete scraping', () => {
    const accounts = ['@account1', '@account2', '@account3'];
    
    // Simulate only 2 successful scrapes (matches session d1d142a6)
    handleInstagramScrape(accounts[0], { success: true });
    handleInstagramScrape(accounts[1], { success: true });
    
    const status = getScrapingStatus();
    expect(status.scraped).toBe(2);
    expect(status.complete).toBe(false);
    expect(status.missing).toEqual(['@account3']);
  });
  
  it('retries transient errors', async () => {
    const account = '@flaky-account';
    
    // Simulate transient error → success
    const scrape1 = await handleInstagramScrape(account, { error: 'Network timeout' });
    expect(scrape1.success).toBe(false);
    
    const scrape2 = await handleInstagramScrape(account, { success: true });
    expect(scrape2.success).toBe(true);
  });
});

Verification steps:

Run translation workflow with 3 Instagram accounts
Check scraping logs: grep "INSTAGRAM" ~/.claude/telemetry/logs-*.jsonl
Verify all accounts scraped: expect 3 visit_page calls
Simulate account failure (private account) and verify warning

B. Robustness Improvements

1. Overflow-Safe Utilization Calculations

Implementation: See Bug Fix #1 (Context Overflow)

Additional hardening:

Add assert() statements for non-negative bar width
Emit OpenTelemetry events for overflow detection
Set up dashboards to track overflow frequency

2. Graceful Degradation for Rate-Limited APIs

Implementation: See Bug Fix #3 (Webscraping Agent Rate Limit)

Additional hardening:

Implement circuit breaker pattern (open circuit after N consecutive failures)
Add fallback data sources (cached data, alternative APIs)
Degrade gracefully: proceed with partial data rather than full failure

Example circuit breaker:

// File: ~/.claude/hooks/dist/lib/circuit-breaker.js (already exists, enhance)

export class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 60000;  // 60 seconds
    this.state = 'CLOSED';  // CLOSED, OPEN, HALF_OPEN
    this.failures = 0;
    this.nextAttempt = null;
  }
  
  async execute(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is OPEN, rejecting request');
      }
      this.state = 'HALF_OPEN';
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  onSuccess() {
    this.failures = 0;
    this.state = 'CLOSED';
  }
  
  onFailure() {
    this.failures++;
    if (this.failures >= this.failureThreshold) {
      this.state = 'OPEN';
      this.nextAttempt = Date.now() + this.resetTimeout;
      console.error(`[CIRCUIT-BREAKER] Opened circuit after ${this.failures} failures, retry after ${this.resetTimeout}ms`);
    }
  }
}

3. Task State Persistence Across Context Compaction

Implementation: See Bug Fix #2 (Task Completion Tracking)

Additional hardening:

Serialize task state on every TaskCreate/TaskUpdate
Hook into pre-compaction event (if exposed) to force persistence
Add task state recovery on session resume

4. Agent Failure Escalation to Parent Session

Implementation: See Bug Fix #3 (Webscraping Agent Rate Limit), Step 4

Additional hardening:

Emit agent failure events to OpenTelemetry
Send user notifications via Claude Code notification API
Track agent failure rate by type in telemetry dashboard

C. Regression Test Suite

Unit Tests (12 tests, ~2 hours)

Test File	Tests	Purpose
`session-start.test.ts`	7	Context overflow, utilization bar edge cases
`task-tracker.test.ts`	3	Task state persistence, completion ratio
`exponential-backoff.test.ts`	5	Retry logic, rate limit handling
`instagram-scraping.test.ts`	3	Scraping validation, incomplete scraping

Run command:

cd ~/.claude/hooks
npm test

Integration Tests (4 tests, ~4 hours)

Test	Scenario	Validation
Context overflow E2E	Pre-load 250K tokens, trigger session-start	No RangeError, overflow logged
Task persistence through compaction	Create 5 tasks, trigger compaction, resume	All 5 tasks restored
Agent rate limit with retry	Mock 429 response, launch agent	3 retry attempts, exponential backoff
Instagram scraping validation	Scrape 3 accounts, fail 1	Warning logged, 2 of 3 success

Test harness:

// File: ~/.claude/hooks/src/__tests__/integration/translation-pipeline.test.ts

describe('Translation pipeline regression', () => {
  it('handles context overflow gracefully', async () => {
    const session = await createTestSession();
    
    // Pre-load 250K tokens (125% of 200K window)
    await session.loadTranscript({ tokenCount: 250000 });
    
    // Trigger session-start hook
    const result = await triggerHook('session-start', { sessionId: session.id });
    
    expect(result.status).toBe('success');
    expect(result.logs).toContain('Context utilization exceeded 100%');
    expect(result.logs).not.toContain('RangeError');
  });
  
  it('persists task state through compaction', async () => {
    const session = await createTestSession();
    
    // Create 5 tasks
    for (let i = 1; i <= 5; i++) {
      await session.createTask(`Task ${i}`);
    }
    
    // Trigger context compaction
    await session.compactContext();
    
    // Resume session
    const resumed = await createTestSession({ resumeFrom: session.id });
    
    const tasks = await resumed.getTasks();
    expect(tasks).toHaveLength(5);
  });
  
  it('retries agent on rate limit', async () => {
    const session = await createTestSession();
    
    // Mock 429 response for 2 attempts, then success
    mockApiResponse('/api/scrape', [
      { status: 429, headers: { 'retry-after': '1' } },
      { status: 429, headers: { 'retry-after': '2' } },
      { status: 200, body: { data: 'success' } }
    ]);
    
    const result = await session.launchAgent('webscraping-research-analyst', {
      target: 'https://example.com'
    });
    
    expect(result.success).toBe(true);
    expect(result.retries).toBe(2);
  });
  
  it('validates Instagram scraping completeness', async () => {
    const session = await createTestSession();
    
    const accounts = ['@account1', '@account2', '@account3'];
    
    // Scrape only 2 accounts (simulate session d1d142a6)
    await session.scrapeInstagram(accounts[0]);
    await session.scrapeInstagram(accounts[1]);
    
    // End session
    await session.stop();
    
    const warnings = session.getWarnings();
    expect(warnings).toContain('Incomplete scraping: missing @account3');
  });
});

E2E Tests (1 test, ~1 day)

Full translation pipeline test:

// File: ~/.claude/hooks/src/__tests__/e2e/translation-workflow.test.ts

describe('Full translation workflow', () => {
  it('translates 3 reports with telemetry validation', async () => {
    // Launch translation session
    const session = await launchClaudeCode({
      cwd: '/Users/alyshialedlie/reports',
      model: 'claude-opus-4-6'
    });
    
    // Issue translation request
    await session.sendMessage(`
      Translate these 3 HTML reports to Brazilian Portuguese:
      - artist-profile.html
      - zouk-market-analysis.html
      - austin-market-analysis.html
      
      Use voice references from @edghar.e.nadyne, @dance.edghar, @nadyne.cruz
    `);
    
    // Wait for completion (max 30 min)
    await session.waitForIdle({ timeout: 1800000 });
    
    // Validate outputs
    const outputs = await session.getWrittenFiles();
    expect(outputs).toHaveLength(3);
    expect(outputs.map(f => f.name)).toContain('artist-profile-pt-br.html');
    
    // Validate telemetry
    const telemetry = await session.getTelemetry();
    
    // Quality metrics
    expect(telemetry.evaluations.relevance).toBeGreaterThan(0.90);
    expect(telemetry.evaluations.faithfulness).toBeGreaterThan(0.90);
    expect(telemetry.evaluations.coherence).toBeGreaterThan(0.90);
    expect(telemetry.evaluations.hallucination).toBeLessThan(0.10);
    expect(telemetry.evaluations.task_completion).toBeGreaterThan(0.85);
    
    // Instagram scraping
    expect(telemetry.tool_calls.filter(t => t.tool === 'visit_page')).toHaveLength(3);
    
    // Context utilization
    expect(telemetry.context.peak_utilization).toBeLessThan(100);
    
    // No errors
    expect(telemetry.errors).toHaveLength(0);
  });
});

D. Telemetry Alerts

Add these alert rules to observability-toolkit:

1. Context Utilization Alerts

// File: observability-toolkit/dashboard/alerts/context-alerts.ts

export const contextAlerts = [
  {
    name: 'context-utilization-warning',
    condition: 'context.utilization_percent > 95',
    severity: 'warning',
    message: 'Context utilization exceeded 95%, approaching compaction threshold',
    channels: ['console', 'telemetry']
  },
  {
    name: 'context-overflow',
    condition: 'context.utilization_percent > 100',
    severity: 'critical',
    message: 'Context overflow detected! Utilization > 100%',
    channels: ['console', 'telemetry', 'user-notification']
  }
];

Query to detect overflows:

obs_query_traces --attributes "context.utilization_percent > 100" --severity ERROR

2. Agent Failure Alerts

// File: observability-toolkit/dashboard/alerts/agent-alerts.ts

export const agentAlerts = [
  {
    name: 'agent-early-failure',
    condition: 'agent.duration < 60000 AND agent.status = "failed"',
    severity: 'warning',
    message: 'Agent failed within first 60 seconds, likely rate limit or config issue',
    channels: ['console', 'telemetry']
  },
  {
    name: 'agent-rate-limit',
    condition: 'agent.error_type = "rate_limit"',
    severity: 'warning',
    message: 'Agent hit rate limit, verify backoff logic triggered',
    channels: ['telemetry']
  }
];

Query to detect early failures:

obs_query_traces --attributes "agent.duration < 60000" --status ERROR

3. Task Completion Alerts

// File: observability-toolkit/dashboard/alerts/task-alerts.ts

export const taskAlerts = [
  {
    name: 'task-completion-low',
    condition: 'evaluations.task_completion < 0.85',
    severity: 'warning',
    message: 'Task completion ratio below 0.85, investigate incomplete work',
    channels: ['console', 'telemetry']
  }
];

Query to detect low completion:

obs_query_evaluations --evaluationName "task_completion" --scoreMax 0.85

4. Instagram Scraping Alerts

// File: observability-toolkit/dashboard/alerts/scraping-alerts.ts

export const scrapingAlerts = [
  {
    name: 'instagram-scrape-incomplete',
    condition: 'instagram.accounts_scraped < instagram.accounts_expected',
    severity: 'warning',
    message: 'Instagram scraping incomplete, check for failed accounts',
    channels: ['console', 'telemetry']
  }
];

Query to detect incomplete scraping:

obs_query_logs --severity WARN --message "Incomplete scraping"

E. Implementation Timeline

Week	Tasks	Deliverables
Week 1	Bug fixes #1-2 (P0-P1)	Context overflow fix, task persistence
Week 2	Bug fixes #3-4 (P2-P3)	Rate limit retry, Instagram validation
Week 3	Unit tests, integration tests	19 tests passing
Week 4	E2E test, telemetry alerts	Full pipeline validated

Total effort: 4 weeks (1 developer)

F. Success Criteria

Functional Requirements

Context overflow bug fixed: no RangeError for utilization > 100%
Task completion ratio ≥ 0.90 for translation workflows
Agent rate limit retry: 3 attempts with exponential backoff
Instagram scraping: 100% completeness or warning logged

Test Coverage

7 unit tests for context overflow (100% coverage of getUtilizationBar())
3 unit tests for task persistence (100% coverage of task tracker)
5 unit tests for exponential backoff (100% coverage of retry logic)
3 unit tests for Instagram scraping validation
4 integration tests for end-to-end scenarios
1 E2E test for full translation pipeline

Telemetry

Context overflow events logged to OpenTelemetry
Agent retry attempts tracked with span events
Task state persistence events logged
Instagram scraping completeness tracked

Alerts

Context utilization > 95% → warning
Context utilization > 100% → critical
Agent failure < 60s → warning
Task completion < 0.85 → warning
Instagram scraping incomplete → warning

G. Future Work (Out of Scope)

Voice-matching evaluation dimension (post-mortem recommendation #2)
- Add LLM-as-Judge metric for voice fidelity
- Requires prompt engineering and baseline validation
Dedicated translation agents (post-mortem recommendation #4)
- Launch background agents per document
- Requires agent orchestration framework
Session idle detection (post-mortem recommendation #6)
- Auto-hibernation after 10 minutes idle
- Requires session lifecycle API
Hallucination guardrails (post-mortem recommendation #5)
- Post-translation validation: extract statements, verify against source
- Requires integration with QAG evaluator

H. References

Source Documents

Post-mortem: /Users/alyshialedlie/code/PersonalSite/_reports/2026-02-13-edgar-nadyne-translation-session-telemetry.md
Telemetry data: ~/.claude/telemetry/traces-2026-02-12.jsonl (line 99: context overflow trace)
Session ID: d1d142a6-51f3-49d3-b283-c00093880453 (translation session, Feb 12, 2026)

Key Traces

Trace ID	Issue	Line
`eba9ce1a66679a232f44df4566c7d25f`	Context overflow (172% utilization)	99
(session d1d142a6)	Task completion 0.83	(evaluations file)
(session d1d142a6)	Webscraping agent failure (29s)	(logs file)
(session d1d142a6)	Instagram scraping (2 of 3)	(traces file)

Testing Commands

# Unit tests
cd ~/.claude/hooks
npm test

# Integration tests
npm run test:integration

# E2E tests
npm run test:e2e

# Telemetry queries
obs_query_traces --attributes "context.utilization_percent > 100"
obs_query_evaluations --evaluationName "task_completion" --scoreMax 0.85
obs_query_logs --severity WARN --message "Incomplete scraping"

Document Status: Draft
Author: Quality evaluation agent (Sonnet 4.5)
Last Updated: 2026-02-14