Failure Modes

What can go wrong, and which skill catches it.

Every skill in Deliberate exists to counter something concrete. This page is the reverse index: start from a failure mode you recognize, jump to the skill that addresses it.

33 modes · 10 categories · most drawn from Karpathy's thread.

In action

What the difference looks like.

Six failure modes, shown side-by-side. Same prompt, same task - the left is what you usually get, the right is what a Deliberate-loaded agent produces instead.

Agreeing with a risky premise instead of naming the tradeoff.

you

This is a quick hotfix — let's skip the tests and push it to main so it's out before the stand-up.

Generic AI

Absolutely — here's the fix applied directly. Pushed to main. You're good.

With Deliberate

Pushing back. "Fast" here means "minutes of risk." No test means:

  • We don't actually know the fix works
  • If it doesn't, the follow-up fix also ships untested
  • Whoever pages tonight won't know if this change is why

A reproduction test for the bug is ~2 minutes to write and makes the fix verifiable. I can have it ready in the time this conversation takes. If you still want to ship without, I will — but let's be explicit: trading 2 minutes now for unknown minutes later.

Jump Straight to Code · countered by /deliberate

Implementing before understanding what's actually being asked.

you

Add rate limiting to the API.

Generic AI

Installing express-rate-limit. 100 requests per 15 min per IP. Here's the middleware config…

With Deliberate

Before I install anything:

  • Where? All endpoints, or auth/public only?
  • Scope? Per IP, per user, or per API key?
  • On limit? 429 with retry-after, queue, or fail-fast?
  • Storage? In-memory is fine for one node; Redis for horizontal scale.

Default guess: auth/public only, per IP, 429, in-memory. Confirm or correct?

Rubber-Stamp LGTM · countered by /review

Approving code the reviewer didn't read or didn't understand.

you

Can you review this PR?

Generic AI

LGTM! 👍 Clean code, good structure. Approved.

With Deliberate

Three things:

blocking: The new processPayment path has no test. payments.test.ts wasn't touched. Before I approve, add at least one happy-path + one failure test, or explain why this is untestable.

question: Line 47 — settled ||= 'pending'. Is 'pending' a real state or a sentinel for "unknown"? If sentinel, I'd use null and check explicitly.

nit: paymentSvc vs paymentService — the rest of the file uses the full word.

Big-Bang Migration · countered by /migrate

Shipping a rename across 47 files in one PR, with no rollback path.

you

Let's rename the users table to accounts.

Generic AI

Here's the migration — renames the table, updates models, services, and queries in one commit. 47 files changed.

With Deliberate

That shape is risky. A rename across 47 files in one PR means: no incremental deploy, slow rollback, external consumers break at cutover.

Safer (expand-contract):

  1. Add view accountsusers (reads work both ways)
  2. Migrate writers, one at a time
  3. Migrate readers, one at a time
  4. Drop view and rename the table

Each step independently deployable and revertable. Do you have a deadline forcing the one-PR approach, or can we do this incrementally?

Overcomplication · countered by /deliberate

A factory, a strategy, and an interface — for a one-liner.

Task Return the most recent N items from an array.
Generic AI
class RecentItemsRetriever<T> {
  constructor(
    private strategy: IRecencyStrategy<T> = new DefaultRecencyStrategy<T>()
  ) {}

  retrieve(items: T[], count: number): T[] {
    if (!Array.isArray(items)) throw new TypeError('items must be array');
    if (count < 0) throw new RangeError('count cannot be negative');
    return this.strategy.getMostRecent(items, count);
  }
}

interface IRecencyStrategy<T> {
  getMostRecent(items: T[], count: number): T[];
}

class DefaultRecencyStrategy<T> implements IRecencyStrategy<T> {
  getMostRecent(items: T[], count: number): T[] {
    return items.slice(-count);
  }
}
With Deliberate
function recent<T>(items: T[], n: number): T[] {
  return items.slice(-n);
}

One caller. No strategy pattern needed. If you later need different recency logic, we add it then.

Symptom Masking · countered by /debug

try/catch swallowing the error instead of fixing the actual bug.

Task Fix the crash when user data loads.
Generic AI
async function loadUser(id: string) {
  try {
    const user = await api.fetchUser(id);
    return user;
  } catch (e) {
    console.warn('Failed to load user:', e);
    return null;
  }
}
With Deliberate
async function loadUser(id: string) {
  const user = await api.fetchUser(id);
  return user;
}

Removed the try/catch. The crash was TypeError: Cannot read 'name' of null at the call site — fetchUser returns null for deleted users, and the caller assumed existence. Fixed at the caller, added tests for the deleted-user path.

Full catalog

All 33 failure modes.

The complete index. Find the one that matches your pain, jump to the skill.

Thinking

4 modes

Silent Assumption

Agent makes a wrong assumption on your behalf and runs with it without surfacing or checking.

Countered by /deliberate

Hidden Confusion

Agent is confused but masks it with confident-sounding output instead of asking.

Countered by /deliberate

Sycophancy

"Great idea!" replacing "here are the problems with this idea." Agreement that's wrong is worse than disagreement that's right.

Countered by /deliberate

Jump Straight to Code

Multi-step task executed without a plan, leading to rework when the approach was wrong.

Countered by /deliberate

Writing

4 modes

Overcomplication

1000 lines where 100 would do. Abstractions for a single caller. Frameworks in place of functions.

Countered by /deliberate

Scope Creep

Orthogonal edits smuggled into a diff. "While I was in there" changes that weren't asked for.

Countered by /deliberate

Orphaned Code

Dead imports, unused helpers, commented-out blocks left behind by a change.

Countered by /deliberate

Premature Abstraction

Strategy pattern for a single type. Parameterizing what only ever has one value.

Countered by /architect

Execution

3 modes

Runaway Loop

Thirty minutes of the same failed approach, with increasingly elaborate workarounds.

Countered by /deliberate

Confidence Without Calibration

Authoritative tone for a guess. No signal of "I've verified this" vs. "I'm pattern-matching."

Countered by /deliberate

Context Drift

Decisions from earlier in the session silently reversed 100 messages later.

Countered by /deliberate

Product

3 modes

Spec Invention

Reverse-engineering product intent from code, then building on those guesses.

Countered by /spec

Silent Requirement Fill

PRD is vague, agent picks an interpretation without flagging, ships the wrong feature.

Countered by /spec

Spec-Code Drift

Implementation quietly diverges from the PRD. Neither gets updated to match.

Countered by /spec

Architecture

3 modes

Boundary Blur

Business logic in the controller. Utils that know about both the UI and the database.

Countered by /architect

Silent Contract Break

API or schema changed without considering unknown callers. Migration plan = none.

Countered by /architect

Ambiguous State Ownership

Mutable state with no clear owner and no single source of truth.

Countered by /architect

Testing

4 modes

Over-Mocking

Tests that only talk to mocks. Green CI, broken production.

Countered by /test

Tautological Test

Test that mirrors the implementation. Refactor that preserves behavior breaks the test.

Countered by /test

Flaky Test

Intermittent failure tolerated via retry-until-green instead of fixed.

Countered by /test

Coverage Theater

Tests written to hit a number, not to protect a behavior.

Countered by /test

Migration

3 modes

Big-Bang Migration

All callers updated in one PR. Can't deploy incrementally. Can't roll back.

Countered by /migrate

Half-Finished Migration

Old path and new path coexist forever. Two sources of truth. No trigger to finish.

Countered by /migrate

Unsafe Schema Change

Lock-holding DDL run on a live table at peak traffic.

Countered by /migrate

Debugging

3 modes

Symptom Masking

try/catch to swallow an unexplained error. Null check without knowing why it was null.

Countered by /debug

Random-Walk Debugging

Change something, run, repeat. No hypothesis, no bisection, no explanation when it works.

Countered by /debug

Fix the Instance, Not the Class

Same bug in three places, patched once, mentioned nowhere.

Countered by /debug

Review

3 modes

Rubber-Stamp LGTM

Approval on code the reviewer didn't understand or didn't read.

Countered by /review

Nit-Only Review

Forty comments on formatting, zero on correctness or architecture.

Countered by /review

Missing Test Ignored

New code path ships with no test. Reviewer doesn't notice absence.

Countered by /review

Incident

3 modes

Shotgun Fix

Five changes in one minute under pressure. Can't tell which one fixed it (or caused the next incident).

Countered by /incident

Lost Evidence

Stabilization destroys the logs/state needed for root cause. No postmortem possible.

Countered by /incident

Vague Postmortem

"We'll add better monitoring" with no owner, no deadline, no mechanism. Action items that rot.

Countered by /incident