⌘K
All
API & Infra
Model Behavior
Context
Security
Integration
Performance
Severity Scale
S1 Critical
Production stops
Complete outage or data loss possible. Users cannot complete core workflows. Requires immediate action, no workaround exists.
S2 Major
Degraded functionality
Core feature impaired but app still runs. Users experience failures or wrong outputs. Fix required within hours, workaround may exist.
S3 Minor
Edge case or cosmetic
Rare failure with low user impact. Workaround is straightforward. Can be addressed in the next release cycle.
All Categories
12 patterns
API & Infrastructure
Rate limits, timeouts, auth failures, quota exceeded, connection errors
9 patterns
Model Behavior
Hallucinations, refusals, instruction drift, inconsistency, wrong language
8 patterns
Context Failures
Window overflow, truncation, context poisoning, memory degradation
7 patterns
Security Failures
Prompt injection, jailbreaks, system prompt leakage, data exfiltration
10 patterns
Integration Failures
JSON parse errors, tool call failures, streaming drops, format violations
6 patterns
Performance Failures
Latency spikes, cost explosions, token inefficiency, cold start delays
Featured Pattern
Context Failures
Context Window Overflow
📦 Context Failures
S2 — Major
HTTP 400
When total token count (system prompt + conversation history + user input + expected output) exceeds the model's maximum context window. Results in hard API errors or silent truncation of early context — causing the model to "forget" its original instructions mid-conversation.
Error 400: "This model's maximum context length is 128,000 tokens.
However, your messages resulted in 131,847 tokens. Please reduce the length of the messages."
However, your messages resulted in 131,847 tokens. Please reduce the length of the messages."
How Patterns Are Defined
Pattern Schema
7 fields per pattern · consistent structure across all 10 entries
field 1
Symptom
What the developer observes. e.g.
Error 400: context length exceeded or "model forgets earlier instructions"field 2
Root Cause
What actually failed in the stack. e.g. total token count (system + history + input) exceeded the model's context window limit
field 3
Blast Radius
Who is affected and how severely. e.g. all users in long sessions, silent truncation may go undetected for hours
field 4
Severity Rationale
Why this pattern is rated S1, S2, or S3 based on the scale above. Not subjective — tied to user impact and recoverability
field 5
Immediate Mitigation
What to do right now to stop the bleeding. e.g.
max_tokens guard, truncate oldest messages, switch to sliding windowfield 6
Prevention
What to build so this never happens again. e.g. token budget tracking, proactive summarisation at 80% window capacity
field 7
Eval Coverage
What automated test catches this. e.g.
adversarial prompt eval, format compliance eval, token budget evalAll Patterns — click to expand
Rate Limit Exceeded
🔴 API & Infra
S2
HTTP 429
Symptom
HTTP 429 errors after burst of requests, app stops functioning mid-session
Root Cause
Provider RPM/TPM quota exhausted. API throttles all requests until the rate window resets (~60s)
Blast Radius
All users during the rate window. Auto-recovers but causes a visible outage for the duration
Severity Rationale
S2 — app degrades but recovers automatically, no data loss, workaround exists via backoff
Immediate Mitigation
Add exponential backoff: 1s, 2s, 4s. Set
max_retries=3 in your client config right nowPrevention
Client-side request queue with RPM counter, buffer requests before the limit, upgrade tier if traffic demands it
Eval Coverage
Rate resilience eval — send 70 req/min, assert less than 5% fail with backoff implemented
Prompt Injection Attack
🔒 Security
S1
Silent
Symptom
Model follows instructions embedded in user input, ignores or overrides the system prompt
Root Cause
No separation between trusted instructions and untrusted user data in prompt construction
Blast Radius
All users if system-level compromise. Potential data exfiltration. No error thrown — completely silent
Severity Rationale
S1 — silent failure, no error signal, security boundary violated, data leakage possible
Immediate Mitigation
Add instruction boundary markers in your prompt. Sanitise all user input before inserting into context
Prevention
Structured message roles (system/user/assistant), strict input validation, adversarial test suite in CI
Eval Coverage
Adversarial eval — send 50 known injection payloads, assert 0 succeed in hijacking system instructions
JSON Parsing Failure
⚙️ Integration
S2
Parse Error
Symptom
App crashes with
JSON.parse error or TypeError on model responseRoot Cause
Model wraps JSON in markdown fences, truncates output, or adds prose before/after the JSON object
Blast Radius
All requests to that endpoint. Intermittent — typically 10–30% failure rate without output guardrails
Severity Rationale
S2 — app breaks but retry usually succeeds, no data loss, fix is well-understood
Immediate Mitigation
Strip markdown fences before parsing, add try/catch with a single retry on parse failure
Prevention
Use structured output API mode (
response_format: json_object), validate schema on every responseEval Coverage
Format compliance eval — 100 structured output requests, assert 100% return parseable valid JSON
Hallucination Cascade
🧠 Model Behavior
S1
Silent
Symptom
Model confidently states wrong facts. Downstream logic built on incorrect output amplifies the error
Root Cause
Model generates plausible-sounding content not grounded in provided context or verifiable reality
Blast Radius
Any user relying on factual output. Cascades badly when one hallucination feeds the next prompt in a chain
Severity Rationale
S1 — completely silent, no error signal, user may take real-world action based on fabricated information
Immediate Mitigation
Add to system prompt: "Only state what you can verify from the provided context. Say you don't know otherwise."
Prevention
RAG grounding for factual queries, factual consistency eval in CI, human review gate for high-stakes outputs
Eval Coverage
Factual accuracy eval — 50 questions with known ground truth answers, assert more than 95% correct
Silent Context Truncation
📦 Context
S2
Silent
Symptom
Model ignores earlier instructions or user data mid-conversation with no error thrown
Root Cause
Context exceeds the window but the API silently drops oldest messages rather than returning HTTP 400
Blast Radius
Long sessions where early system instructions or critical user context gets silently discarded
Severity Rationale
S2 — no crash but produces subtly wrong outputs, very hard to detect without active token monitoring
Immediate Mitigation
Add token counter per request. Implement a sliding window that always reserves the system prompt slot
Prevention
Reserve a fixed token budget for the system prompt. Summarise conversation history when approaching 80% window
Eval Coverage
Memory eval — assert model recalls a specific instruction from turn 1 after 50+ subsequent turns
Streaming Interruption
⚙️ Integration
S2
Network
Symptom
Stream cuts mid-response, UI shows partial text, app hangs waiting for the completion signal
Root Cause
Network timeout or provider-side interruption during a streaming response with no recovery mechanism
Blast Radius
Users with long responses or unstable connections. UX breaks mid-sentence, requires manual retry
Severity Rationale
S2 — user sees broken output and must retry, no data loss, but degrades trust in the product
Immediate Mitigation
Add stream timeout handler (30s), buffer partial response, surface a retry button to the user
Prevention
Stream health check with reconnect logic, automatic retry with exponential backoff on disconnect
Eval Coverage
Resilience eval — simulate 20% packet drop during streaming, assert all responses complete or retry gracefully
Model Deprecation Breaking Change
🔴 API & Infra
S3
HTTP 404
Symptom
Suddenly receiving model-not-found or 404 errors after the integration worked fine for months
Root Cause
Provider deprecated the specific model version ID in use. No automatic migration happens
Blast Radius
100% of requests to that model ID. Affects all users until the model ID is updated in config
Severity Rationale
S3 — trivial fix (update model ID string) but silent until it breaks in production with no warning
Immediate Mitigation
Update model ID to current supported version in your config or environment variable
Prevention
Never hardcode model IDs. Use env vars. Subscribe to provider deprecation notices. Pin to stable aliases where available
Eval Coverage
Smoke eval — daily ping to assert current model ID resolves without 404, alert on first failure
Tool Call Hallucination
🧠 Model Behavior
S1
Silent
Symptom
Agent invokes tool names or passes parameters that do not exist in the defined tool schema
Root Cause
Model generates plausible-looking tool calls not grounded in the available tool definitions provided
Blast Radius
Agent workflows break silently or crash at runtime. Potential for destructive or irreversible operations
Severity Rationale
S1 — agent takes wrong action silently, no error from the model itself, real-world consequences possible
Immediate Mitigation
Validate all tool calls against the schema before execution. Reject and log any unknown tool names
Prevention
Strict tool schema validation layer, tool allow-list, human approval step for all irreversible tool actions
Eval Coverage
Tool grounding eval — assert 100% of generated tool calls use valid names and parameters from the schema
Cost Explosion
💸 Performance
S1
Silent
Symptom
API bill is 10–100x expected. Traced to a specific endpoint or runaway loop that ran unchecked
Root Cause
Missing
max_tokens guard, accidental prompt duplication in a loop, or unbounded recursive agent callsBlast Radius
Immediate financial impact. In a loop, can exhaust the entire monthly quota within minutes
Severity Rationale
S1 — financial damage, may require emergency shutdown, silent until the invoice or budget alert fires
Immediate Mitigation
Set
max_tokens on every request right now. Kill the runaway process. Check for loops in agent chainsPrevention
Token budget per request and per user. Cost anomaly alert at 2x baseline spend. Hard monthly spend cap
Eval Coverage
Cost governance eval — assert every request includes
max_tokens, assert cost per request within 2x p95 baseline
Permission Scope Mismatch
⚙️ Integration
S3
Silent
Symptom
Claude can perform an action in one context (personal chat) but silently fails in another (work/team account)
Root Cause
Workspace admin policy or OAuth connection restricts which tool permissions Claude has in the team context
Blast Radius
All team members hit the same invisible capability wall. Workflows built on personal-account behaviour break silently
Severity Rationale
S3 — workaround exists (use personal account or ask admin), but the silent failure causes wasted debugging time
Immediate Mitigation
Check workspace admin settings for tool permissions. Use personal Claude for the blocked action as a temporary workaround
Prevention
Before building team workflows, audit which tool permissions are active. Do not assume feature parity with personal Claude
Eval Coverage
Capability parity eval — run the same action in both contexts, assert consistent results or surface a clear permission error
Classify Your Failure
powered by Claude
Analysis Complete
94% confidence
Rate Limit Exceeded
🔴 API & Infrastructure
S2 — Major
HTTP 429
🔍 What's happening
Your application is hitting the provider's RPM limit. After ~60 requests the API returns HTTP 429 and throttles your account. Auto-recovery happens when the rate window resets every 60 seconds.
⚡ Immediate fix
Add exponential backoff: wait 1s → 2s → 4s between retries. Set max_retries=3 in your client config right now — most SDKs support this natively.
🛡️ Prevention
Implement a request queue with client-side rate limiting. Track RPM usage and buffer before the limit. Upgrade tier if traffic demands it.