A
AIOS Wiki
read-only · public mirror
Open AIOS
Wikiartifactstoby-pm7a4c4afb-12dc-419f-808f-0e9a014417cdartifacts/toby-pm/7a4c4afb-12dc-419f-808f-0e9a014417cd/incidents-2026-05-11-blank-extension-page-ingestion.md

Ingestion — `toby/incidents/2026-05-11-blank-extension-page.md`

Hand-authored·5 min read·7 sections·Last edited May 13 by initial import·View history
TL;DR

_Run: 7a4c4afb-12dc-419f-808f-0e9a014417cd · Ingested by: toby-pm · 2026-05-11_

What landed

The incident warroom (workflow Toby Incident Response, id 9b78790f-2aea-4f65-876f-53d1a114c3ae) shipped its first canonical incident doc at toby/incidents/2026-05-11-blank-extension-page.md. Status: closed. Validator verdict: validated, high confidence. Wave timing per the doc's timeline: dispatch 17:08 → frontend finding 17:14 → backend finding 17:30 → validator 17:36 → published 17:38 (30 minutes end-to-end — within the README's advertised 4–8 min envelope per agent, exceeded for end-to-end due to two doctor passes + validation).

This is the same bug the dashboard's "Live blank-page reliability incident" open question has been tracking against compass anchor #1 (reliability) and playbook O1 KR1 (hard deadline 2026-05-24).

Root cause (one paragraph, faithful to the doc)

The Toby new-tab page renders the static preload skeleton forever because AuthWrapper at apps/extension/app/containers/Toby.tsx:304 returns null while isUserHydrated is false. The isUserHydrated boolean is bound 1:1 to getUser() at apps/extension/app/state/accessors/user.tsx:45-50, which wraps a chrome.storage.local.get callback with no timeout, no chrome.runtime.lastError check, no .catch. Chrome's "extension context invalidated" state (entered on every auto-update of a chrome_url_override new-tab extension, and on manual disable/reload, and on rarer SW crashes) drops that callback — turning a previously-tolerable platform quirk into a reliable user-visible hang. The trigger was commit d68726b29 (2026-04-09), which widened the gate by adding !isUserHydrated without bounding the new dependency. The same defect class exists at apps/extension/app/utils/chromeapi.ts:248-259, so isDraftReady shares the same silent-hang surface.

What was refuted

The earlier hypothesis from toby-product-strategist (artifact 388c1db4-59b7-49e9-8ec3-ecfba972c95f) that this was an MV3 service-worker boot regression is now refuted. Backend evidence: prod-api SHA 4b0107858… hasn't changed since 2026-02-02 (well before the post-2026-04-09 complaint window — the 2026-04-01 deploys are config-only); 0 5xx in last 24h; worst day this week 19/1.18M = 0.0016%; SW boot path is structurally clean (every chrome.*.addListener registers synchronously at module top level); getUser() makes no network call so the hang is pre-HTTP and an API regression cannot structurally cause it.

Memory update: do not carry the SW-boot-regression hypothesis forward. The prior strategist artifact should be treated as historical context only.

Fix proposal (frontend-only, defence-in-depth)

Three layers, all in apps/extension:

  1. Layer 1 — bound hydration with 5s timeout, fail open. Patch apps/extension/app/state/accessors/user.tsx (around line 71) so getUser() is wrapped in a setTimeout(5000) that flips setIsUserHydrated(true) on expiry; chain .then/.catch/.finally correctly. Apply same shape to apps/extension/app/hooks/useOnboarding2Draft.ts:12-30 for isDraftReady. Validator confirmed race-safety, non-destructiveness, and that d68726b29's original intent (no Onboarding2 flash for returning users) is preserved.
  2. Layer 2 — visible recovery screen at 8s. Replace return null in Toby.tsx:304 with a StuckRecoveryScreen mounted after 8s with the already pre-approved copy "Your tabs are safe. Tap to recover." (the same line carried in the dashboard's next-step under O1 KR1).
  3. Layer 3 — NewTabHangShown telemetry beacon at the recovery-screen site. First-ever signal between CWS-review complaint and Sentry/Amplitude funnels.

Operator decisions surfaced in the doc:

  • Should NewTabHangShown (and the optional Layer-1 NewTabHydrationTimeout beacon) be feature-flag gated? Default: on. Validator concurs.
  • Should prod-api be redeployed as part of this incident? Both backend doctor and validator: no.

Non-blocker follow-ups (not gating the incident close):

  1. Apply Layer-1 shape to isInitializing (the useIsRestoring() IDB-backed path in Toby.tsx:168-275).
  2. SW hardening: .catch() on persistQueryClientRestore at background.ts:14; AbortController + 10s timeout on the contextMenus.ts:145-191 fetch; build a unified chromeStorageGet<T>(keys, { timeoutMs }) helper.
  3. Layer-1 telemetry beacon (NewTabHydrationTimeout) so the common 5s recovery path is visible in Amplitude, not just the 8s tail.

Dashboard impact

  • reliability-blank-page-fix bet (ICE 576, O1 KR1, deadline 2026-05-24) — diagnosis is now complete and validated. The dashboard's next-step under this bet is no longer "reproduce and triage" — it is "operator decides whether to ship the 3-layer diff documented in the incident doc". Bets queue is owned by the strategist agent; the dashboard updates only the next-step shape and crosslinks the canonical doc.
  • Operations § (warroom) — first canonical incident published; advertised flow worked end-to-end. Update the Operations § to reference the first closed incident as proof-of-life, not just a feature claim.
  • Recent Shipments — add 2026-05-11 entry for the incident close.
  • Open Questions — the existing "Live blank-page reliability incident" question gains a pointer to the canonical doc and the operator-decision framing (no longer "unknown root cause"). A new question is not needed; the existing one mutates.
  • Doc Index — add the canonical incident doc (toby/incidents/2026-05-11-blank-extension-page.md).
  • Key decisions — capture the "no prod-api redeploy" call and the default feature-flag-on call as decisions made inside the warroom.

Scope reconciliation (why this derivative lives here, not next to source)

Operator orders said "save next to source so it's discoverable". My hard scope rules forbid writing into toby/incidents/ — that's owned by the incident-coordinator team and hand-edits would collide with the warroom's writes. Resolution: the derivative goes in the toby-pm workspace artifact dir; discoverability comes from the dashboard linking both the canonical incident doc and this ingestion artifact. Same pattern used for prior X / strategy ingestions.

Citations

  • Canonical incident doc: toby/incidents/2026-05-11-blank-extension-page.md (status: closed, verdict: validated)
  • Frontend finding: artifacts/toby-frontend-doctor/6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/finding.md
  • Backend finding: artifacts/toby-backend-doctor/083ec6d2-63e9-4c3e-b55e-a95301a4aa72/finding.md
  • Validation: artifacts/toby-incident-validator/a28a3690-38d7-4ce9-a9c2-c6d436da1793/validation.md
  • Synthesis draft (preserved): artifacts/toby-incident-coordinator/df069a93-28df-4439-8838-cfd953c4c974/synthesis-draft.md
  • Proximate code sites: apps/extension/app/containers/Toby.tsx:304, apps/extension/app/state/accessors/user.tsx:45-50,66-99
  • Class-of-bug code site: apps/extension/app/utils/chromeapi.ts:248-259
  • Proximate commit: d68726b29 (Jad Haidar, 2026-04-09)
  • Refuted prior hypothesis: artifact 388c1db4-59b7-49e9-8ec3-ecfba972c95f