A
AIOS Wiki
read-only · public mirror
Open AIOS
Wikitobyincidentstoby/incidents/README.md

Toby Incidents — How this works

Hand-authored·4 min read·6 sections·Last edited May 13 by external edit·View history

Toby Incidents

This folder is the warroom for Toby bug reports and technical incidents. It's maintained by a team of four agents working as a workflow.

The team

AgentRole
toby-incident-coordinatorWarroom commander. Picks the work unit (inbox / labeled queue / discernment sweep of the open backlog), dispatches specialists, synthesises the incident doc, decides which transitions to record.
toby-frontend-doctorUI specialist. Reproduces via Playwright, reads apps/extension / apps/landing / apps/mobile.
toby-backend-doctorGo API specialist. Pulls GCP logs, queries Toby prod DB read-only, reads apps/api.
toby-incident-validatorQuality gate. Re-checks evidence, applies triple-check (correctness / quality / security), returns a binding verdict + confidence before the incident closes.
toby-incident-fix-shipperLast-mile patcher. Only runs when validator returned validated + high confidence. Creates a fresh git worktree of axiomzen/toby-mono-repo under /tmp/, applies the proposed fix from the incident doc, runs the verify plan, pushes a warroom/... branch, and opens a PR. The user's primary checkout is never touched. Skipped automatically on conditional / rejected / non-high-confidence verdicts — humans review those first.

After all five sub-agents finish, the coordinator posts a single consolidated report to Slack #C0B3FN70MEE (Toby AZ Slack) summarising verdict, root cause, doctors used, ticket outcome, and PR URL (or decline reason). The Slack report is skipped on no-op runs (Wave 0 finds nothing to investigate).

The workflow is recorded in AIOS as Toby Incident Response (id 9b78790f-2aea-4f65-876f-53d1a114c3ae).

How to report an incident

The only way into the warroom is a ticket labeled needs-warroom. Manual file drops into _inbox/ are deprecated — that path got us into trouble (no provenance, no priority, no assignee, no audit trail when a complaint moved between mediums).

The flow:

  1. File a ticket in the Toby project (via UI, MCP, or an agent like toby-state-of-business). For a user-facing bug, set kind: "bug" and add labels: ["needs-warroom"]. Body should describe symptom / reproduce / when / anything-noticed — same shape as the old inbox files.
  2. The ticket→warroom bridge (lib/tickets.tsbridgeWarroomIfNeeded) sees the new label, writes an inbox file at toby/incidents/_inbox/YYYY-MM-DD-<ticket-id>-<slug>.md containing the ticket body + provenance, and stamps warroom-bridged on the ticket so it isn't double-written.
  3. The warroom workflow (Toby Incident Response, daily cron at 09:00 UTC) picks up the inbox file at its next tick, runs the 4-wave investigation, writes the canonical incident doc, and uses aios_tickets_record_attempt to transition the source ticket to done / in_review / blocked based on the validator's verdict.

End-to-end latency is therefore "next 09:00 UTC tick", not "4–8 min". If a complaint genuinely can't wait until tomorrow, an operator can still tick the workflow manually from the Workflows app — but the standard path is the bridge.

What the warroom produces

Each closed incident becomes a single canonical doc:

toby/incidents/2026-05-11-blank-extension-page.md

With sections: symptom, reproduction, root cause, production impact, proposed fix (as a diff), verify plan, validator verdict + confidence, open questions, timeline of who-did-what-when, and links to the doctors' run artifacts for archeology.

The agents NEVER apply patches to the Toby codebase themselves. The incident doc carries a fix proposal that you decide whether to ship.

Triggering manually (operator-only escape hatch)

If a complaint cannot wait until the next cron tick, an operator can run the workflow directly from the Workflows app: tick Toby Incident Response. The coordinator's Wave 0 will pull from inbox-first, then ticket queue. Prefer the labeled-ticket path — manual ticks bypass the audit trail and should be reserved for genuine emergencies.

Verdicts

The validator returns one of three states; the incident doc reflects it:

  • closed → validator confirmed the fix would work; ready to ship after operator review.
  • open → twice-rejected by the validator and the doctors couldn't satisfy its objections. Operator decides what to do next.
  • conditional → fix is good IF a specific question is resolved first (e.g. "is this migration safe under live write traffic?"). Surfaced for operator decision.

Folder structure

toby/incidents/
├── README.md                            ← this file
├── _inbox/                              ← bridge-written; one file per warroom-bound ticket
│   └── 2026-05-12-toby-14-<slug>.md
├── 2026-05-11-blank-extension-page.md   ← canonical closed incident doc
└── ...

The agents never apply patches to the codebase. The ticket→warroom bridge writes _inbox/ files; the workflow consumes them and then leaves them alone (operator decides when to archive). Humans should not hand-drop files into _inbox/ — use the labeled-ticket path so provenance lives in one system.