# Ingestion — `toby/incidents/2026-05-11-blank-extension-page.md`

_Run: 7a4c4afb-12dc-419f-808f-0e9a014417cd · Ingested by: toby-pm · 2026-05-11_

## What landed

The incident warroom (workflow `Toby Incident Response`, id `9b78790f-2aea-4f65-876f-53d1a114c3ae`) shipped its **first canonical incident doc** at `toby/incidents/2026-05-11-blank-extension-page.md`. Status: **closed**. Validator verdict: **validated, high confidence**. Wave timing per the doc's timeline: dispatch 17:08 → frontend finding 17:14 → backend finding 17:30 → validator 17:36 → published 17:38 (30 minutes end-to-end — within the README's advertised 4–8 min envelope per agent, exceeded for end-to-end due to two doctor passes + validation).

This is the same bug the dashboard's "Live blank-page reliability incident" open question has been tracking against compass anchor #1 (reliability) and playbook O1 KR1 (hard deadline 2026-05-24).

## Root cause (one paragraph, faithful to the doc)

The Toby new-tab page renders the static preload skeleton forever because `AuthWrapper` at `apps/extension/app/containers/Toby.tsx:304` returns `null` while `isUserHydrated` is false. The `isUserHydrated` boolean is bound 1:1 to `getUser()` at `apps/extension/app/state/accessors/user.tsx:45-50`, which wraps a `chrome.storage.local.get` callback with **no timeout, no `chrome.runtime.lastError` check, no `.catch`**. Chrome's "extension context invalidated" state (entered on every auto-update of a `chrome_url_override` new-tab extension, and on manual disable/reload, and on rarer SW crashes) drops that callback — turning a previously-tolerable platform quirk into a reliable user-visible hang. The trigger was commit `d68726b29` (2026-04-09), which widened the gate by adding `!isUserHydrated` without bounding the new dependency. The same defect class exists at `apps/extension/app/utils/chromeapi.ts:248-259`, so `isDraftReady` shares the same silent-hang surface.

## What was refuted

The earlier hypothesis from `toby-product-strategist` (artifact `388c1db4-59b7-49e9-8ec3-ecfba972c95f`) that this was an **MV3 service-worker boot regression** is now **refuted**. Backend evidence: prod-api SHA `4b0107858…` hasn't changed since 2026-02-02 (well before the post-2026-04-09 complaint window — the 2026-04-01 deploys are config-only); 0 5xx in last 24h; worst day this week 19/1.18M = 0.0016%; SW boot path is structurally clean (every `chrome.*.addListener` registers synchronously at module top level); `getUser()` makes no network call so the hang is pre-HTTP and an API regression cannot structurally cause it.

**Memory update**: do not carry the SW-boot-regression hypothesis forward. The prior strategist artifact should be treated as historical context only.

## Fix proposal (frontend-only, defence-in-depth)

Three layers, all in `apps/extension`:

1. **Layer 1 — bound hydration with 5s timeout, fail open.** Patch `apps/extension/app/state/accessors/user.tsx` (around line 71) so `getUser()` is wrapped in a `setTimeout(5000)` that flips `setIsUserHydrated(true)` on expiry; chain `.then/.catch/.finally` correctly. Apply same shape to `apps/extension/app/hooks/useOnboarding2Draft.ts:12-30` for `isDraftReady`. Validator confirmed race-safety, non-destructiveness, and that `d68726b29`'s original intent (no Onboarding2 flash for returning users) is preserved.
2. **Layer 2 — visible recovery screen at 8s.** Replace `return null` in `Toby.tsx:304` with a `StuckRecoveryScreen` mounted after 8s with the **already pre-approved** copy "Your tabs are safe. Tap to recover." (the same line carried in the dashboard's next-step under O1 KR1).
3. **Layer 3 — `NewTabHangShown` telemetry beacon** at the recovery-screen site. First-ever signal between CWS-review complaint and Sentry/Amplitude funnels.

Operator decisions surfaced in the doc:

- Should `NewTabHangShown` (and the optional Layer-1 `NewTabHydrationTimeout` beacon) be feature-flag gated? Default: **on**. Validator concurs.
- Should `prod-api` be redeployed as part of this incident? Both backend doctor and validator: **no**.

Non-blocker follow-ups (not gating the incident close):

1. Apply Layer-1 shape to `isInitializing` (the `useIsRestoring()` IDB-backed path in `Toby.tsx:168-275`).
2. SW hardening: `.catch()` on `persistQueryClientRestore` at `background.ts:14`; `AbortController` + 10s timeout on the `contextMenus.ts:145-191` fetch; build a unified `chromeStorageGet<T>(keys, { timeoutMs })` helper.
3. Layer-1 telemetry beacon (`NewTabHydrationTimeout`) so the common 5s recovery path is visible in Amplitude, not just the 8s tail.

## Dashboard impact

- **`reliability-blank-page-fix` bet (ICE 576, O1 KR1, deadline 2026-05-24)** — diagnosis is now **complete and validated**. The dashboard's next-step under this bet is no longer "reproduce and triage" — it is "operator decides whether to ship the 3-layer diff documented in the incident doc". Bets queue is owned by the strategist agent; the dashboard updates only the next-step shape and crosslinks the canonical doc.
- **Operations § (warroom)** — first canonical incident published; advertised flow worked end-to-end. Update the Operations § to reference the first closed incident as proof-of-life, not just a feature claim.
- **Recent Shipments** — add 2026-05-11 entry for the incident close.
- **Open Questions** — the existing "Live blank-page reliability incident" question gains a pointer to the canonical doc and the **operator-decision** framing (no longer "unknown root cause"). A new question is *not* needed; the existing one mutates.
- **Doc Index** — add the canonical incident doc (`toby/incidents/2026-05-11-blank-extension-page.md`).
- **Key decisions** — capture the "no prod-api redeploy" call and the default feature-flag-on call as decisions made inside the warroom.

## Scope reconciliation (why this derivative lives here, not next to source)

Operator orders said "save next to source so it's discoverable". My hard scope rules forbid writing into `toby/incidents/` — that's owned by the incident-coordinator team and hand-edits would collide with the warroom's writes. Resolution: the derivative goes in the toby-pm workspace artifact dir; discoverability comes from the dashboard linking both the canonical incident doc and this ingestion artifact. Same pattern used for prior X / strategy ingestions.

## Citations

- Canonical incident doc: `toby/incidents/2026-05-11-blank-extension-page.md` (status: closed, verdict: validated)
- Frontend finding: `artifacts/toby-frontend-doctor/6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/finding.md`
- Backend finding: `artifacts/toby-backend-doctor/083ec6d2-63e9-4c3e-b55e-a95301a4aa72/finding.md`
- Validation: `artifacts/toby-incident-validator/a28a3690-38d7-4ce9-a9c2-c6d436da1793/validation.md`
- Synthesis draft (preserved): `artifacts/toby-incident-coordinator/df069a93-28df-4439-8838-cfd953c4c974/synthesis-draft.md`
- Proximate code sites: `apps/extension/app/containers/Toby.tsx:304`, `apps/extension/app/state/accessors/user.tsx:45-50,66-99`
- Class-of-bug code site: `apps/extension/app/utils/chromeapi.ts:248-259`
- Proximate commit: `d68726b29` (Jad Haidar, 2026-04-09)
- Refuted prior hypothesis: artifact `388c1db4-59b7-49e9-8ec3-ecfba972c95f`
