Blank extension page — infinite-load hang on new tab
TL;DR
The Toby new-tab page renders the static preload skeleton and never transitions to the real UI. AuthWrapper at apps/extension/app/containers/Toby.tsx:304 returns null forever because isUserHydrated (added 2026-04-09 in commit d68726b29) never flips. The proximate cause is that getUser() at apps/extension/app/state/accessors/user.tsx:45-50 has no timeout / no chrome.runtime.lastError check / no .catch; when Chrome's "extension context invalidated" state drops the chrome.storage.local.get callback (which happens on every extension auto-update), the promise hangs forever.
Backend (Go API) is innocent. Prod-api SHA hasn't changed since 2026-02-02; 0 5xx in last 24h; SW boot path is structurally clean. The earlier toby-product-strategist MV3-SW-boot-regression hypothesis (388c1db4) is refuted.
Fix (frontend only, defence-in-depth): bound getUser() and getOnboarding2Draft with 5s timeouts that fail open; replace return null with a visible recovery screen after 8s; instrument with a NewTabHangShown beacon.
Status: closed → shipped (PR https://github.com/axiomzen/toby-mono-repo/pull/12, 2026-05-13).
Symptom
- New-tab page loads, shows the static
.preloadedBgskeleton (light-grey sidebar columns, circle avatar, center card-grid background). - Never transitions.
#rootexists in the DOM but has zero children. - Accessibility tree is empty; users see only decorative CSS.
- Console is silent. No JS error, no failed network call. That's why users describe it as "infinite loading", not "crash".
- Recurring across Chrome / Vivaldi 7.0 / Orion based on CWS reviews and forum reports.
Root cause
Proximate (frontend)
apps/extension/app/containers/Toby.tsx:304:
if (isInitializing || !isDraftReady || !isUserHydrated) return null;
returns null forever when any of the three booleans never flips true. <App> and <Onboarding2> are both inside this wrapper, so a null return kills the entire visible UI.
The third boolean — isUserHydrated — was added to this gate in commit d68726b29 (Jad Haidar, 2026-04-09 19:07 +03:00, "fix: gate AuthWrapper on user hydration to prevent duplicate onboarding events"). That fix solves a real bug (returning users briefly seeing the onboarding flow), but it widened the failure surface without bounding the new dependency.
isUserHydrated is bound 1:1 to a single unbounded promise at apps/extension/app/state/accessors/user.tsx:45-50:
export const getUser = () =>
new Promise<LoginResponse | null>((resolve) => {
chrome.storage.local.get('user', ({ user }) => {
resolve(user ?? null);
});
});
No timeout. No chrome.runtime.lastError check. No .catch(). The same defect exists in apps/extension/app/utils/chromeapi.ts:248-259 (which useOnboarding2Draft is built on), so isDraftReady is exposed to the same class of bug. isInitializing waits on useIsRestoring() from the react-query persistor (IDB-backed), with a parallel silent-hang surface.
Distal (Chrome MV3 platform)
Chrome's "extension context invalidated" state (renderer-side) drops chrome.storage.local.get callbacks. This state is entered when:
- Chrome auto-updates the extension while a
chrome_url_overridenew-tab is open (this happens to every Toby user, every release — Toby owns the new-tab page). - The user manually disables/re-enables or reloads the extension in
chrome://extensions. - The SW crashes during a critical handshake phase (rarer).
This is a Chromium MV3 platform behaviour, not a Toby code regression.
Why now (post-2026-04-09)
The underlying chrome.storage callback drop is evergreen. The extension used to accidentally tolerate it because the AuthWrapper gate only depended on isInitializing || !isDraftReady. Commit d68726b29 added !isUserHydrated, binding the rendered UI 1:1 to that one unbounded callback. The widened gate is what turned a tolerable platform quirk into a reliable user-visible hang.
What this is NOT
The earlier toby-product-strategist hypothesis (artifact 388c1db4-59b7-49e9-8ec3-ecfba972c95f) that this was an MV3 service-worker boot regression is refuted by independent backend evidence (validator re-checked):
| Probe | Evidence |
|---|---|
| Prod-api SHA stability | 4b0107858e706c904e6cf2841fbcbf81a1e2f94f has been the active SHA on three consecutive Cloud Run revisions (00425, 00426, 00427) since 2026-02-02 — well before the post-2026-04-09 user-complaint window. The 2026-04-01 deploys are config-only redeploys. |
| 5xx volume | 0 in last 24h. Worst day this week: 19 / 1.18M = 0.0016%. |
| ERROR severity logs | 23 entries in last 7 days; 22 are expected 401s on stale-session endpoints, 1 is a downstream toby-ai-api 500 not on the auth path. No panics. No fatals. |
| DB health | 41,578 DAU, 720 new signups / 7d, healthy diurnal curve, peak 3,352 active-this-hour. |
| SW boot path | Every chrome.*.addListener registers synchronously at module top level. No listener-after-await MV3 boot bug. |
| Network in hang path | getUser() does not make a network call — the hang is pre-HTTP. An API regression structurally cannot cause this. |
Fix
Layer 1 — bound the hydration promises with a 5s timeout (fail open)
apps/extension/app/state/accessors/user.tsx, around line 71 (the useEffect that calls getUser):
useEffect(() => {
let cancelled = false;
const timeout = setTimeout(() => {
if (!cancelled) {
console.warn('[toby] getUser() exceeded 5s; falling back to null user.');
setIsUserHydrated(true);
}
}, 5000);
getUser()
.then((user) => {
if (cancelled) return;
if (user) setUser(user);
setIsUserHydrated(true);
})
.catch((err) => {
console.error('[toby] getUser() failed:', err);
if (!cancelled) setIsUserHydrated(true);
})
.finally(() => clearTimeout(timeout));
return () => {
cancelled = true;
clearTimeout(timeout);
};
}, []);
Apply the same shape to apps/extension/app/hooks/useOnboarding2Draft.ts:12-30 for isDraftReady.
Validator confirmed:
- Race-safe. On the happy path,
.finally(clearTimeout)runs in the microtask flush before the 5s macrotask can fire. - Non-destructive. When the storage callback arrives slow (>5s),
.thenstill applies the user record once it eventually resolves — the in-memory user is not clobbered. - Preserves
d68726b29intent. Returning users with healthy storage never see an Onboarding2 flash.
Layer 2 — visible recovery screen after 8s
apps/extension/app/containers/Toby.tsx:304:
const [showStuckEscapeHatch, setShowStuckEscapeHatch] = useState(false);
useEffect(() => {
if (!isInitializing && isDraftReady && isUserHydrated) return;
const t = setTimeout(() => setShowStuckEscapeHatch(true), 8000);
return () => clearTimeout(t);
}, [isInitializing, isDraftReady, isUserHydrated]);
if (isInitializing || !isDraftReady || !isUserHydrated) {
if (showStuckEscapeHatch) {
return <StuckRecoveryScreen onRetry={() => window.location.reload()} />;
}
return null;
}
Copy: "Your tabs are safe. Tap to recover." — pre-approved per toby/00-state-of-the-project.md:50 and toby/strategy/playbook.md O1 KR1.
Layer 3 — telemetry beacon
At the setShowStuckEscapeHatch(true) site:
trackEvent('NewTabHangShown', {
isInitializing,
isDraftReady,
isUserHydrated,
browser,
version,
});
Establishes the first signal we have between "user complains in CWS review" and the existing Sentry / Amplitude funnels.
Verify plan
-
Manual repro (canonical scenario):
cd apps/extension && pnpm install && pnpm dev- Load unpacked at
apps/extension/.output/chrome-mv3viachrome://extensions. - Open the new tab; confirm happy path renders.
- Toggle the extension off and back on in
chrome://extensions. The open tab now haschrome.runtime.id === undefined(the canonical context-invalidated state). - Reload the new tab. Pre-fix: blank skeleton forever. Post-fix: Onboarding2 (or App) renders after 5s with
[toby] getUser() exceeded 5sin the console.
-
Recovery-screen repro (Layer 2): In DevTools, before reload:
chrome.storage.local.get = () => {}. Reload. Pre-Layer-2: blank. Post-Layer-2: StuckRecoveryScreen renders at 8s with the "tap to recover" CTA. -
Regression check (
d68726b29must remain fixed): WhenisUserHydratedlegitimately resolves with a returning user before 5s, AuthWrapper must behave exactly as today — no flash of<Onboarding2>. -
Telemetry sanity: Confirm
NewTabHangShownflows into Amplitude. Establish baseline frequency in the first 7 days. If volume is non-trivial without a correlated prod-api 5xx spike, the platform-side chrome.storage drop hypothesis is confirmed in prod. -
Backend monitoring (no action expected): Watch
prod-api5xx; expect to stay at ~0. If a 5xx spike correlates with aNewTabHangShownspike, re-open backend investigation. With current cadence (one prod-api code change in 4 months) this is unlikely.
Operator decisions to surface
- Should
NewTabHangShown(and the optional Layer-1 hydration-timeout beacon below) be gated behind a feature flag? Default proposal is on. Validator concurs. - Should we redeploy prod-api as part of this incident? Both backend doctor and validator: no. The API code is innocent; a redeploy is needless blast-radius.
Follow-ups (NOT blockers for closing this incident)
- Apply Layer-1 shape to
isInitializing. Specifically theuseIsRestoring()IDB-backed path insideuseHandleRedirectFromQueryParamsatToby.tsx:168-275. Today Layer 2 catches this case at 8s with a recovery screen; the ideal is a 5s Layer-1-style local timer that lets the page self-heal without a tap. - SW hardening (three items flagged by the backend doctor):
apps/extension/entrypoints/background.ts:14— chain.catch(err => console.error('[toby-sw] persistQueryClientRestore failed', err))on the persist-restore call. Currently fire-and-forget; IDB failures are silently swallowed.apps/extension/app/background/contextMenus.ts:145-191— wrap the SWfetchcalls with anAbortController+ 10s timeout. Currently a stuck TCP socket can keep the SW alive past its idle window.- Build a unified
chromeStorageGet<T>(keys, { timeoutMs })helper that wrapschrome.runtime.lastError+chrome.runtime.idvalidity + a timeout. Replace every rawchrome.storage.local.get(key, cb)callsite with this. Layer 1 only patches thegetUserandgetOnboarding2Draftsites; this helper would fix the class.
- Layer-1 telemetry beacon. Fire a second, lower-stakes event (e.g.
NewTabHydrationTimeout) at the Layer-1 5s fallback site. Without it, the common post-fix recovery path is invisible in Amplitude — we'd only see the 8s worst-case tail.
Open questions
None blocking. Operator decisions above are explicit choices, not unknowns.
Citations
- Frontend finding:
artifacts/toby-frontend-doctor/6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/finding.md(run id6e2b3eb9-36bf-42d3-8de3-5afa48f4b167; Playwright screenshot + synthetic HTML alongside). - Backend finding:
artifacts/toby-backend-doctor/083ec6d2-63e9-4c3e-b55e-a95301a4aa72/finding.md(run id083ec6d2-63e9-4c3e-b55e-a95301a4aa72). - Validation:
artifacts/toby-incident-validator/a28a3690-38d7-4ce9-a9c2-c6d436da1793/validation.md(run ida28a3690-38d7-4ce9-a9c2-c6d436da1793). - Synthesis draft (preserved):
artifacts/toby-incident-coordinator/df069a93-28df-4439-8838-cfd953c4c974/synthesis-draft.md. - Ship result:
artifacts/toby-incident-fix-shipper/b3400d87-0830-4f89-bb70-4c3907c085f1/ship-result.md(run idb3400d87-0830-4f89-bb70-4c3907c085f1). - Proximate code site:
apps/extension/app/containers/Toby.tsx:304andapps/extension/app/state/accessors/user.tsx:45-50,66-99. - Class-of-bug code site:
apps/extension/app/utils/chromeapi.ts:248-259(same missing-lastError/timeout pattern ingetChromeStorage). - Proximate commit:
d68726b29(Jad Haidar, 2026-04-09, +5/-1 toToby.tsx). - Prior strategist hypothesis (REFUTED): artifact
388c1db4-59b7-49e9-8ec3-ecfba972c95f.
Timeline
| Time (UTC) | Event |
|---|---|
| 2026-04-09 16:07 | Commit d68726b29 lands. AuthWrapper gate widened to depend on !isUserHydrated. |
| post-2026-04-09 | User complaints about "blank page on infinite load" begin (CWS reviews, Vivaldi 7.0 / Orion forums). |
| 2026-05-11 17:08 | Incident dispatched to warroom. |
| 2026-05-11 17:14 | Frontend doctor reports proximate cause + defers to backend. |
| 2026-05-11 17:30 | Backend doctor refutes SW-boot-regression hypothesis with prod-api / DB / SW-boot evidence. |
| 2026-05-11 17:36 | Validator confirms synthesis with validated / high confidence. |
| 2026-05-11 17:38 | Incident doc published. Status: closed, fix queued for implementation by the operator. |
| 2026-05-13 04:59 | Bridge files TOBY-14 ("Ship the blank-extension-page reliability hotfix") into the warroom inbox. |
| 2026-05-13 05:08 | Fix-shipper opens PR https://github.com/axiomzen/toby-mono-repo/pull/12. Status: shipped. |
PR shipped
- PR URL: https://github.com/axiomzen/toby-mono-repo/pull/12
- Branch:
warroom/2026-05-11-blank-extension-page-toby-14 - Commit:
06baf0f8a(base75a09e34donorigin/main) - Source ticket: TOBY-14
- Shipper run:
b3400d87-0830-4f89-bb70-4c3907c085f1(artifact:artifacts/toby-incident-fix-shipper/b3400d87-0830-4f89-bb70-4c3907c085f1/ship-result.md) - Files touched:
apps/extension/app/state/accessors/user.tsx(Layer 1 — 5s timeout fail-open ongetUser())apps/extension/app/hooks/useOnboarding2Draft.ts(Layer 1 — same shape onisReady)apps/extension/app/containers/Toby.tsx(Layer 2 + Layer 3 —StuckRecoveryScreenat 8s +NewTabHangShownbeacon)apps/extension/app/components/StuckRecoveryScreen.tsx(new component, copy "Your tabs are safe. Tap to recover.")
- What was deliberately NOT included (per the "Follow-ups" section): Layer-1 shape for
isInitializing, the SW-hardening trio, the Layer-1 telemetry beacon. Those remain queued as separate work. - Verify plan: the doc's "Verify plan" section above is the canonical checklist for CI + reviewer manual repro. No local typecheck/lint ran in the ephemeral worktree (no
node_modules); the diff is minimal and additive, relying on CI for the full check.