Claude Code Observability Framework Guide

A comprehensive guide to the production-grade observability system for Claude Code hooks using OpenTelemetry, Langtrace, and SigNoz.

Overview

This framework provides full visibility into Claude Code hook execution through:

Distributed tracing - Track operations across hook invocations
Metrics collection - Monitor performance and usage patterns
Structured logging - Debug issues with correlated logs
LLM instrumentation - Track token usage and costs

All telemetry exports to both local files (for offline analysis) and SigNoz Cloud (for dashboards and alerting).

Architecture

Claude Code Hooks
       │
       ▼
 HookMonitor (otel-monitor.ts)
 • Initializes OpenTelemetry SDK
 • Creates root spans for each hook
 • Records metrics and logs
       │
       ▼
┌─────────────────────────────────────┐
│       Dual Export Pattern           │
├──────────────────┬──────────────────┤
│  Local Files     │  SigNoz Cloud    │
│  JSONL format    │  OTLP/gzip       │
│  ~/.claude/      │  Real-time       │
│  telemetry/      │  dashboards      │
└──────────────────┴──────────────────┘

Quick Start

1. Environment Setup

Add to ~/.claude/.envrc:

# Enable telemetry
export OTEL_ENABLED="true"
export SIGNOZ_ENABLED="true"

# SigNoz Cloud endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT="https://ingest.us.signoz.cloud"
export OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_COMPRESSION="gzip"

# Service identification
export OTEL_SERVICE_NAME="claude-code-hooks"
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=development,service.version=1.0.0"

# Authentication (from Doppler)
export SIGNOZ_INGESTION_KEY="<your-key>"

2. Basic Usage

import {
  initTelemetry,
  shutdown,
  withSpan,
  recordMetric,
  logger,
  getTraceUrl
} from './lib/otel';

// Initialize at startup
initTelemetry();

// Wrap operations in spans
await withSpan('my-operation', { 'attr.key': 'value' }, async (span) => {
  // Your code here
  logger.info('Processing item', { itemId: 123 });
});

// Record custom metrics
recordMetric('items.processed', 42, { queue: 'main' });

// Get trace link for debugging
console.log(`View trace: ${getTraceUrl()}`);

// Clean shutdown
await shutdown();

Core Components

OpenTelemetry Core (`hooks/lib/otel.ts`)

The foundation providing tracing, metrics, and logging.

Function	Purpose
`initTelemetry()`	Initialize SDK with exporters
`withSpan(name, attrs, fn)`	Wrap async function in a span
`recordMetric(name, value, attrs)`	Record counter metric
`recordGauge(name, value, attrs)`	Record gauge metric
`logger.info/warn/error()`	Structured logging
`getTraceUrl()`	Get SigNoz trace link
`getTraceId()`	Get current trace ID
`shutdown()`	Graceful cleanup

Hook Monitor (`hooks/lib/otel-monitor.ts`)

Simplified instrumentation for hooks via instrumentHook().

import { instrumentHook } from './lib/otel-monitor';

instrumentHook('my-hook', async (ctx) => {
  ctx.addAttribute('custom.attr', 'value');
  ctx.recordEvent('processing_started');
  ctx.logger.info('Hook executing');

  // Create child span for sub-operations
  await ctx.startChildSpan('fetch-data', {}, async () => {
    // Nested operation
  });
});

Langtrace (`hooks/lib/langtrace.ts`)

Automatic LLM provider instrumentation with PII protection.

import { initLangtrace, recordLLMEvent } from './lib/langtrace';

// Initialize with PII redaction (default)
initLangtrace();

// Or with custom patterns
initLangtrace({
  contentProcessor: createPIIProcessor([
    { pattern: /ACME-\d{6}/g, replacement: '[ACME_ID]' }
  ])
});

Auto-redacted patterns:

Email addresses, phone numbers, SSNs
Credit card numbers, API keys, AWS keys
JWT tokens, Bearer tokens, passwords

Instrumented Hooks

Hook	Event	Description
`session-start-otel.ts`	SessionStart	Session initialization
`mcp-pre-tool-otel.ts`	PreToolUse	MCP tool invocation start
`mcp-post-tool-otel.ts`	PostToolUse	MCP tool completion
`agent-pre-tool-otel.ts`	PreToolUse	Subagent spawn
`agent-post-tool-otel.ts`	PostToolUse	Subagent completion
`skill-activation-prompt-otel.ts`	UserPromptSubmit	Skill detection
`tsc-check-otel.ts`	PostToolUse	TypeScript checks
`stop-build-check-otel.ts`	Stop	Build verification

Metrics Reference

Hook Metrics

Metric	Type	Description
`hook.duration`	Histogram	Execution time distribution
`hook.executions`	Counter	Total invocations
`mcp.invocations`	Counter	MCP tool calls
`agent.invocations`	Counter	Subagent spawns

LLM Metrics

Metric	Type	Description
`gen_ai.client.token.usage`	Counter	Tokens consumed
`gen_ai.client.cost`	Counter	Cost in USD
`gen_ai.client.operation.duration`	Histogram	LLM call latency

Build Metrics

Metric	Type	Description
`build.check.duration`	Histogram	Type check time
`build.errors`	Gauge	Error count

Output Locations

Local Files

~/.claude/telemetry/
├── traces-YYYY-MM-DD.jsonl    # Span data
├── logs-YYYY-MM-DD.jsonl      # Log records
└── llm-events-YYYY-MM-DD.jsonl # LLM calls

Remote Dashboard

URL: https://tight-ladybird.us.signoz.cloud/

Available Dashboards

Dashboard	Purpose
Claude Code Hooks Observability	Core hook performance
Token Usage & Cost Efficiency	LLM consumption tracking
Tool & MCP Usage Analytics	MCP server/tool usage
Error & Anomaly Detection	Error monitoring
Subagent Analytics	Agent invocations
Session Health Overview	Session activity

Configuration Options

Trace Sampling

Control telemetry volume with sampling:

# Development - capture everything
export OTEL_TRACES_SAMPLER="always_on"

# Production - 10% sampling
export OTEL_TRACES_SAMPLER="parentbased_traceidratio"
export OTEL_TRACES_SAMPLER_ARG="0.1"

Debug Mode

Enable verbose SDK logging:

export OTEL_LOG_LEVEL="debug"

Circuit Breaker

Built-in protection when SigNoz is unreachable:

Trips after 3 consecutive failures
Resets after 60 seconds
Fails fast to prevent request blocking

Troubleshooting

No data in SigNoz

Check OTEL_ENABLED="true" and SIGNOZ_ENABLED="true"
Verify SIGNOZ_INGESTION_KEY is set
Enable debug: OTEL_LOG_LEVEL="debug"
Check local files exist in ~/.claude/telemetry/

High latency on exports

Ensure compression: OTEL_EXPORTER_OTLP_COMPRESSION="gzip"
Reduce sampling in production
Check for circuit breaker trips in logs

Missing LLM traces

Verify initLangtrace() called at startup
Check LANGTRACE_WRITE_TO_FILE="true" for local debugging
Confirm provider is supported (OpenAI, Anthropic, Bedrock, etc.)

MCP Tools: signoz_list_services, signoz_search_metric_by_text