# Frontend finding — blank-extension-page (2026-05-11)

## Reproduced?

partial — I reproduced **the user-visible failure mode** (the static
preload skeleton shown alone with no React content above it) in
Playwright using a synthetic page that exactly replicates
`apps/extension/entrypoints/toby/index.html` and never injects any
content into `#root`. I could **not** load the actual unpacked
extension in Playwright — there is no built `.output/` and
`node_modules` is uninstalled (`pnpm install` + `pnpm build` would be
multi-minute work, and Playwright extension loading needs headed
Chromium with `--load-extension`). See **Verify plan** below for the
follow-up repro that builds the real extension.

Screenshot of what the user sees in this state:
`./6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/repro-blank-page.png`. It is
identical in shape to the symptom reported in CWS reviews and on the
Vivaldi 7.0 / Orion forums per
`toby/strategy/compass.md` anchor #1 and
`toby/00-state-of-the-project.md`.

## Symptoms observed

- Page loads, renders the static HTML preload skeleton (light-grey
  sidebar columns + circle avatar + center "card grid" bg image), and
  never transitions to the real UI.
- `#root` exists in the DOM but has **zero children** — React has
  either not mounted yet, or has mounted but is rendering `null`.
- Playwright accessibility snapshot of the synthetic repro is empty —
  the entire visible UI is purely decorative CSS, exactly matching
  the "blank / infinite-loading" complaint.
- Console: the symptom is *silent*. No JS error is thrown, no failed
  network call is logged at the React layer — the page just sits.
  That's why users describe it as "infinite loading", not "crash".

## Root cause (best hypothesis)

**Confidence: high** — the AuthWrapper render-null gate at
`apps/extension/app/containers/Toby.tsx:304` is the proximate frontend
failure mode:

```tsx
if (isInitializing || !isDraftReady || !isUserHydrated) return null;
```

If **any** of those three booleans never resolves, `AuthWrapper`
returns `null` *forever*, so neither `<App>` nor `<Onboarding2>` ever
mounts. The HTML preload skeleton from
`apps/extension/entrypoints/toby/index.html:62-69` stays on screen
indefinitely — which is *exactly* the "blank / infinite-loading"
state users report.

The third boolean — `isUserHydrated` — was added to this gate in
**commit `d68726b29`** (Apr 9, 2026, "fix: gate AuthWrapper on user
hydration to prevent duplicate onboarding events"). That fix solves a
real bug (returning users briefly seeing the onboarding flow) but it
introduces a *new* indefinite-hang surface, because
`isUserHydrated` only flips to `true` inside the `getUser()` →
`chrome.storage.local.get('user', cb)` callback at
`apps/extension/app/state/accessors/user.tsx:71-76`:

```tsx
useEffect(() => {
  getUser().then((user) => {
    if (user) setUser(user);
    setIsUserHydrated(true);
  });
}, []);
```

…and **`getUser` at `user.tsx:45-50` has no timeout, no
`chrome.runtime.lastError` check, and no `.catch()`**:

```tsx
export const getUser = () =>
  new Promise<LoginResponse | null>((resolve) => {
    chrome.storage.local.get('user', ({ user }) => {
      resolve(user ?? null);
    });
  });
```

If the MV3 service worker is dead or the extension context is
invalidated (e.g. after an extension update on a still-open new-tab,
or the documented Chrome SW lifecycle race), `chrome.storage.local.get`
can return without invoking its callback. `getUser`'s promise then
never resolves, `setIsUserHydrated(true)` never runs, and the gate at
`Toby.tsx:304` holds null forever.

This also bridges the `toby-product-strategist`
(`388c1db4-59b7-49e9-8ec3-ecfba972c95f`) hypothesis that the blank-page
issue is a **Manifest V3 service-worker boot regression** — the SW
failure *upstream* surfaces *downstream* as this frontend hang
because the hydration step trusts the storage callback to always
fire. Same story applies in symmetry to `isDraftReady`
(`useOnboarding2Draft.ts:12-30` → `getChromeStorage` at
`utils/chromeapi.ts:248-259`, no `lastError` check) and to
`isInitializing` waiting on `useIsRestoring()` from the
react-query persistor (IDB-backed; can hang silently if IndexedDB
is unavailable).

## Evidence

### Console (synthetic repro)

```
[WARNING] [toby-repro] AuthWrapper render-null condition is active. \
  isUserHydrated=false, isDraftReady=false, isInitializing=true. \
  Page is stuck on preload skeleton.
```

(The warning is mine — it labels what `AuthWrapper` looks like when
all three gates are stuck. The real extension is *silent* in this
state — see Symptoms.)

### Network

No requests to log — the failure happens **before** any auth or data
fetch runs. The hang is at the chrome.storage rehydration step,
which is a synchronous-feeling API local to the extension process.
If this were a backend hang, we'd see a pending HTTP call to
api.gettoby.com; we do not.

### Code

- `apps/extension/app/containers/Toby.tsx:304` — the render-null
  gate. Three booleans, any one of which hanging false produces the
  blank screen.
- `apps/extension/app/containers/Toby.tsx:277-313` — full
  `AuthWrapper`. No fallback UI, no error boundary, no timeout.
- `apps/extension/app/state/accessors/user.tsx:45-50` — `getUser`
  with no `chrome.runtime.lastError` check, no `.catch`, no timeout.
- `apps/extension/app/state/accessors/user.tsx:66-99` — `UserProvider`,
  where `setIsUserHydrated(true)` is gated on the unbounded promise.
- `apps/extension/app/utils/chromeapi.ts:248-259` — `getChromeStorage`
  helper, used by `useOnboarding2Draft`, same missing-lastError
  pattern.
- `apps/extension/app/hooks/useOnboarding2Draft.ts:12-30` —
  `isReady` waits on the same chrome.storage helper.
- `apps/extension/entrypoints/toby/index.html:62-69` — the static
  `.preloadedBg` skeleton that's all the user sees when React
  renders nothing.
- `apps/extension/entrypoints/toby/main.tsx:33-60` — React tree root.
  Notice `<App key='app' />` and `<Onboarding2 key='onboarding' />`
  are both *inside* `AuthWrapper`, so any null return from
  AuthWrapper kills the entire visible UI.

### Recent commits (extension folder, since 2026-04-01)

```
0f3aa38d2  feat: add 4h Session Start heartbeat event for intra-day retention analysis  (2026-04-14)
9d6e8e4f3  chore: update extension version to 1.13.0
d68726b29  fix: gate AuthWrapper on user hydration to prevent duplicate onboarding events  (2026-04-09) ◄ PROXIMATE CAUSE
cde22c935  fix: eliminate experimentEntityId race condition by deriving it from the draft
bc5e45305  refactor: remove onboarding-signup-position A/B experiment, ship "end" variant
```

Commit `d68726b29` is the proximate frontend regression for this
symptom: it widened the gate without adding a timeout or fallback for
the new dependency. Before it, the same hang could only be triggered
by `isInitializing || !isDraftReady` (still possible — same
underlying class of bug, just narrower attack surface).

### Synthetic Playwright reproduction

- Repro page: `./6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/repro-index.html`
  (copy of `entrypoints/toby/index.html` with no `main.tsx` ⇒
  `#root` stays empty ⇒ same DOM as AuthWrapper returning null).
- Screenshot: `./6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/repro-blank-page.png`.
- Snapshot result: accessibility tree empty; user sees only the
  decorative skeleton. **Matches the support-ticket symptom 1:1.**

## Proposed fix

Two-layer defence — fix the proximate hang AND prevent its class:

### Layer 1 — bound the hydration promise (file `apps/extension/app/state/accessors/user.tsx`, around line 71)

```tsx
useEffect(() => {
  let cancelled = false;
  const timeout = setTimeout(() => {
    if (!cancelled) {
      // chrome.storage callback never fired — fail open to "no user",
      // let the rest of the app proceed instead of hanging on null.
      console.warn('[toby] getUser() exceeded 5s; falling back to null user.');
      setIsUserHydrated(true);
    }
  }, 5000);

  getUser()
    .then((user) => {
      if (cancelled) return;
      if (user) setUser(user);
      setIsUserHydrated(true);
    })
    .catch((err) => {
      console.error('[toby] getUser() failed:', err);
      if (!cancelled) setIsUserHydrated(true);
    })
    .finally(() => clearTimeout(timeout));

  return () => {
    cancelled = true;
    clearTimeout(timeout);
  };
}, []);
```

Apply the same shape to `useOnboarding2Draft.ts:12-30` (5s timeout →
`setIsReady(true)` with `draft=null`).

### Layer 2 — replace `return null` with a guarded error/loading state at `Toby.tsx:304`

After ~8 seconds of "still gating", show a visible escape hatch
instead of silent null:

```tsx
const [showStuckEscapeHatch, setShowStuckEscapeHatch] = useState(false);

useEffect(() => {
  if (!isInitializing && isDraftReady && isUserHydrated) return;
  const t = setTimeout(() => setShowStuckEscapeHatch(true), 8000);
  return () => clearTimeout(t);
}, [isInitializing, isDraftReady, isUserHydrated]);

if (isInitializing || !isDraftReady || !isUserHydrated) {
  if (showStuckEscapeHatch) {
    return <StuckRecoveryScreen
      // "Your tabs are safe. Tap to recover." — copy already
      // pre-approved per toby/strategy/playbook.md O1 KR1.
      onRetry={() => window.location.reload()}
    />;
  }
  return null;
}
```

This is the "**your tabs are safe; tap to recover**" UI that
`toby/00-state-of-the-project.md` line 50 already calls for as part
of the O1 KR1 reliability work.

### Layer 3 — instrumentation (so we measure this in prod)

Wire a beacon at the `setShowStuckEscapeHatch(true)` site:
`trackEvent('NewTabHangShown', { isInitializing, isDraftReady,
isUserHydrated, browser, version })`. Then we can finally measure
how often this fires and confirm the SW boot regression hypothesis
from `toby-product-strategist` (artifact `388c1db4`).

## Verify plan

1. **Manual repro of the real extension**:
   1. `cd apps/extension && pnpm install && pnpm dev`
   2. In Chrome → `chrome://extensions` → Developer mode → Load
      unpacked → `apps/extension/.output/chrome-mv3`.
   3. Open the new tab — sanity-check that the happy path renders.
   4. Force the hang: in DevTools console, monkey-patch
      `chrome.storage.local.get = () => {}` BEFORE reload, then
      reload the new-tab page. Expected: blank skeleton, no UI.
   5. Apply Layer 1 fix. Expected: page transitions to "no user" → Onboarding2 mounts after the 5s timeout.

2. **Service-worker repro (closer to the wild bug)**:
   1. Same as above but force the SW to fail at boot:
      `chrome://extensions` → Service worker → click "Inspect" → in
      the SW devtools, throw an error in the SW context, or close it
      and prevent restart.
   2. Open a new tab. Pre-fix: blank skeleton forever. Post-fix:
      Onboarding2 (or App, depending on local user state) renders
      after 5s + console warning is logged.

3. **Playwright e2e (headed) once a build exists**:
   1. Launch Chromium with `--disable-extensions-except=<path>
      --load-extension=<path>`.
   2. Navigate to `chrome-extension://<id>/toby.html`.
   3. Assert `#root` has children within 10s.
   4. Add a variant with `chrome.storage.local.get` stubbed to a
      never-resolving callback — assert that the **recovery screen**
      (Layer 2) appears instead of an empty DOM.

4. **Regression check for the bug Layer-1 doesn't reintroduce**:
   The whole reason `d68726b29` was added is to prevent returning
   users from briefly seeing the onboarding flow. After the fix,
   verify: when `isUserHydrated` legitimately resolves with a
   pre-existing user, **before** the 5s timeout, the gate behaves
   exactly as today — no flash of `<Onboarding2>` for returning
   users. The fix preserves the gate; it only adds a timeout escape
   for the *hang* case.

## Defer to

**backend (partial)** — the proximate frontend regression is fully
characterised above and a frontend-only fix solves the user-visible
symptom. But the *root* of the hang (chrome.storage callback not
firing) lives in the extension service worker / MV3 lifecycle, which
sits between Chrome and our Go API. Recommend the coordinator
**also** dispatch `toby-backend-doctor` to investigate:

- service worker boot path in
  `apps/extension/entrypoints/background.ts`, especially anything
  that races with `chrome.storage` on cold start;
- whether the Go API or extension background script is producing
  `chrome.runtime.id`-invalidating errors (e.g. CORS/CSP mismatches
  during a deploy that briefly invalidate live extension contexts);
- any sentry/log evidence of MV3 SW boot failures spiking in the
  window matching the user complaints (`toby-product-strategist`
  artifact `388c1db4` hypothesised mid-2024 → today; my reading
  narrows the *frontend* sensitivity to **post-2026-04-09**, which is
  a useful constraint for log filtering).

If backend confirms an SW boot regression, ship Layer 1 + Layer 2 of
my proposed fix *regardless* — the frontend should never be able to
hang silently on a chrome.storage timeout, even after the SW issue is
resolved. Defence-in-depth is the right posture here per the compass
anchor #1 ("save and restore tabs reliably… reducing anxiety, not
adding it").
