# Draft synthesis — blank-extension-page (2026-05-11)

## Summary

Toby's new-tab extension page hangs on the static preload skeleton, with no React content above it. Recurring symptom; affects every Toby user every time we ship an extension update (or whenever Chrome auto-updates the extension while the new-tab is open) — most don't see it because they don't reopen the new-tab; the ones who do reopen it report "blank page on infinite load".

## Root cause

**Proximate (frontend):** `apps/extension/app/containers/Toby.tsx:304` —

```tsx
if (isInitializing || !isDraftReady || !isUserHydrated) return null;
```

returns null forever when `isUserHydrated` (added in commit **`d68726b29`**, 2026-04-09: *"fix: gate AuthWrapper on user hydration to prevent duplicate onboarding events"*) never flips true. `isUserHydrated` is bound to a single unbounded promise in `apps/extension/app/state/accessors/user.tsx:45-50`:

```tsx
export const getUser = () =>
  new Promise<LoginResponse | null>((resolve) => {
    chrome.storage.local.get('user', ({ user }) => {
      resolve(user ?? null);
    });
  });
```

No timeout. No `chrome.runtime.lastError` check. No `.catch()`. If the `chrome.storage.local.get` callback never fires, the promise hangs forever and AuthWrapper returns null forever.

**Distal (platform):** Chrome's **"extension context invalidated"** state (renderer-side) drops `chrome.storage.local.get` callbacks. This state is entered when Chrome auto-updates the extension while a `chrome_url_override` new-tab is open (which is every Toby user, every release), when the user manually disables/re-enables the extension, or when the SW crashes during a critical handshake phase. This is a **Chromium MV3 platform behaviour**, not a Toby code regression.

**Why now (post-2026-04-09):** the underlying chrome.storage drop is an evergreen Chrome MV3 phenomenon — the extension used to *accidentally* tolerate it because the AuthWrapper gate only depended on `isInitializing || !isDraftReady`. Commit `d68726b29` added `!isUserHydrated` to the gate, binding the rendered UI 1:1 to that unbounded callback. **The widened gate is what turned a tolerable platform quirk into a reliable user-visible hang.**

## What this is NOT

The earlier toby-product-strategist hypothesis (artifact `388c1db4`) that this was an **MV3 service-worker boot regression** is **refuted**:

- Prod-api SHA hasn't changed since 2026-02-02 (`commit-sha=4b0107858`, three consecutive Cloud Run revisions on the same SHA; the 2026-04-01 deploys are config-only redeploys).
- 5xx volume on prod-api last 24 h: **0**. Worst day this week: 19 / 1.18M = 0.0016%.
- 23 ERROR-severity log entries in last 7 days, all expected 401s on stale-session endpoints. No panics. No fatals.
- DB healthy: 41,578 DAU, 720 new signups / 7 d, healthy diurnal curve.
- SW boot path is structurally clean: every `chrome.*.addListener` registers synchronously at module top level. No listener-after-await MV3 boot bug.
- `getUser()` does NOT hit the network — the hang is pre-HTTP, so an API regression cannot be the cause.

## Proposed fix (frontend, defence-in-depth)

### Layer 1 — bound the hydration promises with a 5s timeout that fails open

`apps/extension/app/state/accessors/user.tsx` around line 71 (the `useEffect` that calls `getUser()`):

```tsx
useEffect(() => {
  let cancelled = false;
  const timeout = setTimeout(() => {
    if (!cancelled) {
      console.warn('[toby] getUser() exceeded 5s; falling back to null user.');
      setIsUserHydrated(true);
    }
  }, 5000);

  getUser()
    .then((user) => {
      if (cancelled) return;
      if (user) setUser(user);
      setIsUserHydrated(true);
    })
    .catch((err) => {
      console.error('[toby] getUser() failed:', err);
      if (!cancelled) setIsUserHydrated(true);
    })
    .finally(() => clearTimeout(timeout));

  return () => {
    cancelled = true;
    clearTimeout(timeout);
  };
}, []);
```

Apply the same shape to `apps/extension/app/hooks/useOnboarding2Draft.ts:12-30` for `isDraftReady`.

### Layer 2 — replace `return null` with a visible escape hatch after 8s

`apps/extension/app/containers/Toby.tsx:304`:

```tsx
const [showStuckEscapeHatch, setShowStuckEscapeHatch] = useState(false);

useEffect(() => {
  if (!isInitializing && isDraftReady && isUserHydrated) return;
  const t = setTimeout(() => setShowStuckEscapeHatch(true), 8000);
  return () => clearTimeout(t);
}, [isInitializing, isDraftReady, isUserHydrated]);

if (isInitializing || !isDraftReady || !isUserHydrated) {
  if (showStuckEscapeHatch) {
    return <StuckRecoveryScreen onRetry={() => window.location.reload()} />;
  }
  return null;
}
```

Copy: *"Your tabs are safe. Tap to recover."* — already pre-approved per `toby/00-state-of-the-project.md:50` and `toby/strategy/playbook.md` O1 KR1.

### Layer 3 — telemetry beacon

At the `setShowStuckEscapeHatch(true)` site, fire `trackEvent('NewTabHangShown', { isInitializing, isDraftReady, isUserHydrated, browser, version })`. This finally gives us a signal between "user complains in CWS review" and our existing Sentry/Amplitude burn.

## Backend hardening (follow-up, NOT required to close incident)

The Go API itself does not need a change. But the extension service worker has three unrelated fragility issues that, while they don't *cause* this bug, do make its underlying platform conditions more frequent. File as follow-ups, ship outside this incident:

1. **Catch the persist-restore rejection** in `apps/extension/entrypoints/background.ts:14`. Currently fire-and-forget; an IDB failure is silently swallowed.
2. **AbortController on SW `fetch`s** in `apps/extension/app/background/contextMenus.ts:145-191` (10s timeout). Currently a stuck TCP socket can keep the SW alive past its idle window.
3. **Unified `chromeStorageGet<T>(keys, { timeoutMs })` helper** that wraps `chrome.runtime.lastError` checks + `chrome.runtime.id` validity + a timeout. Replace every raw `chrome.storage.local.get(key, cb)` callsite with this. The FE Layer 1 fix only patches the one `getUser` site; this helper would fix the class.

## Verify plan

1. **Manual repro (canonical scenario for chrome.storage drop):**
   1. `cd apps/extension && pnpm install && pnpm dev`
   2. Load unpacked at `apps/extension/.output/chrome-mv3` via `chrome://extensions`.
   3. Open the new tab; confirm happy path renders.
   4. Toggle the extension off and back on in `chrome://extensions` (this puts the open tab into the "context invalidated" state — `chrome.runtime.id === undefined`).
   5. Reload the new tab. **Pre-fix:** blank skeleton forever. **Post-fix:** Onboarding2 (or App) renders after 5s, with the `[toby] getUser() exceeded 5s` console warning.

2. **Recovery-screen repro (Layer 2):**
   - In DevTools, monkey-patch `chrome.storage.local.get = () => {}` *before* reloading the new-tab page. Pre-Layer-2: blank. Post-Layer-2: StuckRecoveryScreen renders after 8s with the "tap to recover" CTA.

3. **Regression check (the d68726b29 bug must stay fixed):**
   When `isUserHydrated` legitimately resolves with a pre-existing user **before** the 5s timeout, AuthWrapper must behave exactly as today — no flash of `<Onboarding2>` for returning users.

4. **Telemetry sanity (Layer 3):**
   Confirm `NewTabHangShown` events flow into Amplitude. Establish baseline frequency in the first 7 days. If volume is non-trivial *without* a correlated prod-api 5xx spike, the platform-side chrome.storage drop hypothesis is confirmed.

5. **Backend monitoring (no action expected):**
   Continue watching `prod-api` 5xx; expect to stay at ~0. If a 5xx spike correlates with a new `NewTabHangShown` spike, re-open backend investigation. With current cadence (one prod-api code change in 4 months) this is unlikely.

## Open questions / operator decisions

- **Defer SW hardening?** The three backend-flagged hardening items are not required to close this incident. Ship FE Layer 1+2+3 first; queue SW hardening as a separate piece of work.
- **Do we want the `NewTabHangShown` beacon gated on a feature flag?** Default is to ship it on.

## Citations

- **Frontend finding:** `artifacts/toby-frontend-doctor/6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/finding.md` (run id `6e2b3eb9-36bf-42d3-8de3-5afa48f4b167`).
- **Backend finding:** `artifacts/toby-backend-doctor/083ec6d2-63e9-4c3e-b55e-a95301a4aa72/finding.md` (run id `083ec6d2-63e9-4c3e-b55e-a95301a4aa72`).
- **Frontend Playwright repro screenshot:** same folder, `repro-blank-page.png`.
- **Proximate code site:** `apps/extension/app/containers/Toby.tsx:304`.
- **Proximate hydration site:** `apps/extension/app/state/accessors/user.tsx:45-50, 66-99`.
- **Proximate commit:** `d68726b29` (2026-04-09).
- **Prior strategist hypothesis (REFUTED):** artifact `388c1db4-59b7-49e9-8ec3-ecfba972c95f`.
