A
AIOS Wiki
read-only · public mirror
Open AIOS
Wikiartifactstoby-frontend-doctor6e2b3eb9-36bf-42d3-8de3-5afa48f4b167artifacts/toby-frontend-doctor/6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/finding.md

Frontend finding — blank-extension-page (2026-05-11)

Hand-authored·8 min read·15 sections·Last edited May 13 by initial import·View history

Reproduced?

partial — I reproduced the user-visible failure mode (the static preload skeleton shown alone with no React content above it) in Playwright using a synthetic page that exactly replicates apps/extension/entrypoints/toby/index.html and never injects any content into #root. I could not load the actual unpacked extension in Playwright — there is no built .output/ and node_modules is uninstalled (pnpm install + pnpm build would be multi-minute work, and Playwright extension loading needs headed Chromium with --load-extension). See Verify plan below for the follow-up repro that builds the real extension.

Screenshot of what the user sees in this state: ./6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/repro-blank-page.png. It is identical in shape to the symptom reported in CWS reviews and on the Vivaldi 7.0 / Orion forums per toby/strategy/compass.md anchor #1 and toby/00-state-of-the-project.md.

Symptoms observed

  • Page loads, renders the static HTML preload skeleton (light-grey sidebar columns + circle avatar + center "card grid" bg image), and never transitions to the real UI.
  • #root exists in the DOM but has zero children — React has either not mounted yet, or has mounted but is rendering null.
  • Playwright accessibility snapshot of the synthetic repro is empty — the entire visible UI is purely decorative CSS, exactly matching the "blank / infinite-loading" complaint.
  • Console: the symptom is silent. No JS error is thrown, no failed network call is logged at the React layer — the page just sits. That's why users describe it as "infinite loading", not "crash".

Root cause (best hypothesis)

Confidence: high — the AuthWrapper render-null gate at apps/extension/app/containers/Toby.tsx:304 is the proximate frontend failure mode:

if (isInitializing || !isDraftReady || !isUserHydrated) return null;

If any of those three booleans never resolves, AuthWrapper returns null forever, so neither <App> nor <Onboarding2> ever mounts. The HTML preload skeleton from apps/extension/entrypoints/toby/index.html:62-69 stays on screen indefinitely — which is exactly the "blank / infinite-loading" state users report.

The third boolean — isUserHydrated — was added to this gate in commit d68726b29 (Apr 9, 2026, "fix: gate AuthWrapper on user hydration to prevent duplicate onboarding events"). That fix solves a real bug (returning users briefly seeing the onboarding flow) but it introduces a new indefinite-hang surface, because isUserHydrated only flips to true inside the getUser()chrome.storage.local.get('user', cb) callback at apps/extension/app/state/accessors/user.tsx:71-76:

useEffect(() => {
  getUser().then((user) => {
    if (user) setUser(user);
    setIsUserHydrated(true);
  });
}, []);

…and getUser at user.tsx:45-50 has no timeout, no chrome.runtime.lastError check, and no .catch():

export const getUser = () =>
  new Promise<LoginResponse | null>((resolve) => {
    chrome.storage.local.get('user', ({ user }) => {
      resolve(user ?? null);
    });
  });

If the MV3 service worker is dead or the extension context is invalidated (e.g. after an extension update on a still-open new-tab, or the documented Chrome SW lifecycle race), chrome.storage.local.get can return without invoking its callback. getUser's promise then never resolves, setIsUserHydrated(true) never runs, and the gate at Toby.tsx:304 holds null forever.

This also bridges the toby-product-strategist (388c1db4-59b7-49e9-8ec3-ecfba972c95f) hypothesis that the blank-page issue is a Manifest V3 service-worker boot regression — the SW failure upstream surfaces downstream as this frontend hang because the hydration step trusts the storage callback to always fire. Same story applies in symmetry to isDraftReady (useOnboarding2Draft.ts:12-30getChromeStorage at utils/chromeapi.ts:248-259, no lastError check) and to isInitializing waiting on useIsRestoring() from the react-query persistor (IDB-backed; can hang silently if IndexedDB is unavailable).

Evidence

Console (synthetic repro)

[WARNING] [toby-repro] AuthWrapper render-null condition is active. \
  isUserHydrated=false, isDraftReady=false, isInitializing=true. \
  Page is stuck on preload skeleton.

(The warning is mine — it labels what AuthWrapper looks like when all three gates are stuck. The real extension is silent in this state — see Symptoms.)

Network

No requests to log — the failure happens before any auth or data fetch runs. The hang is at the chrome.storage rehydration step, which is a synchronous-feeling API local to the extension process. If this were a backend hang, we'd see a pending HTTP call to api.gettoby.com; we do not.

Code

  • apps/extension/app/containers/Toby.tsx:304 — the render-null gate. Three booleans, any one of which hanging false produces the blank screen.
  • apps/extension/app/containers/Toby.tsx:277-313 — full AuthWrapper. No fallback UI, no error boundary, no timeout.
  • apps/extension/app/state/accessors/user.tsx:45-50getUser with no chrome.runtime.lastError check, no .catch, no timeout.
  • apps/extension/app/state/accessors/user.tsx:66-99UserProvider, where setIsUserHydrated(true) is gated on the unbounded promise.
  • apps/extension/app/utils/chromeapi.ts:248-259getChromeStorage helper, used by useOnboarding2Draft, same missing-lastError pattern.
  • apps/extension/app/hooks/useOnboarding2Draft.ts:12-30isReady waits on the same chrome.storage helper.
  • apps/extension/entrypoints/toby/index.html:62-69 — the static .preloadedBg skeleton that's all the user sees when React renders nothing.
  • apps/extension/entrypoints/toby/main.tsx:33-60 — React tree root. Notice <App key='app' /> and <Onboarding2 key='onboarding' /> are both inside AuthWrapper, so any null return from AuthWrapper kills the entire visible UI.

Recent commits (extension folder, since 2026-04-01)

0f3aa38d2  feat: add 4h Session Start heartbeat event for intra-day retention analysis  (2026-04-14)
9d6e8e4f3  chore: update extension version to 1.13.0
d68726b29  fix: gate AuthWrapper on user hydration to prevent duplicate onboarding events  (2026-04-09) ◄ PROXIMATE CAUSE
cde22c935  fix: eliminate experimentEntityId race condition by deriving it from the draft
bc5e45305  refactor: remove onboarding-signup-position A/B experiment, ship "end" variant

Commit d68726b29 is the proximate frontend regression for this symptom: it widened the gate without adding a timeout or fallback for the new dependency. Before it, the same hang could only be triggered by isInitializing || !isDraftReady (still possible — same underlying class of bug, just narrower attack surface).

Synthetic Playwright reproduction

  • Repro page: ./6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/repro-index.html (copy of entrypoints/toby/index.html with no main.tsx#root stays empty ⇒ same DOM as AuthWrapper returning null).
  • Screenshot: ./6e2b3eb9-36bf-42d3-8de3-5afa48f4b167/repro-blank-page.png.
  • Snapshot result: accessibility tree empty; user sees only the decorative skeleton. Matches the support-ticket symptom 1:1.

Proposed fix

Two-layer defence — fix the proximate hang AND prevent its class:

Layer 1 — bound the hydration promise (file apps/extension/app/state/accessors/user.tsx, around line 71)

useEffect(() => {
  let cancelled = false;
  const timeout = setTimeout(() => {
    if (!cancelled) {
      // chrome.storage callback never fired — fail open to "no user",
      // let the rest of the app proceed instead of hanging on null.
      console.warn('[toby] getUser() exceeded 5s; falling back to null user.');
      setIsUserHydrated(true);
    }
  }, 5000);

  getUser()
    .then((user) => {
      if (cancelled) return;
      if (user) setUser(user);
      setIsUserHydrated(true);
    })
    .catch((err) => {
      console.error('[toby] getUser() failed:', err);
      if (!cancelled) setIsUserHydrated(true);
    })
    .finally(() => clearTimeout(timeout));

  return () => {
    cancelled = true;
    clearTimeout(timeout);
  };
}, []);

Apply the same shape to useOnboarding2Draft.ts:12-30 (5s timeout → setIsReady(true) with draft=null).

Layer 2 — replace return null with a guarded error/loading state at Toby.tsx:304

After ~8 seconds of "still gating", show a visible escape hatch instead of silent null:

const [showStuckEscapeHatch, setShowStuckEscapeHatch] = useState(false);

useEffect(() => {
  if (!isInitializing && isDraftReady && isUserHydrated) return;
  const t = setTimeout(() => setShowStuckEscapeHatch(true), 8000);
  return () => clearTimeout(t);
}, [isInitializing, isDraftReady, isUserHydrated]);

if (isInitializing || !isDraftReady || !isUserHydrated) {
  if (showStuckEscapeHatch) {
    return <StuckRecoveryScreen
      // "Your tabs are safe. Tap to recover." — copy already
      // pre-approved per toby/strategy/playbook.md O1 KR1.
      onRetry={() => window.location.reload()}
    />;
  }
  return null;
}

This is the "your tabs are safe; tap to recover" UI that toby/00-state-of-the-project.md line 50 already calls for as part of the O1 KR1 reliability work.

Layer 3 — instrumentation (so we measure this in prod)

Wire a beacon at the setShowStuckEscapeHatch(true) site: trackEvent('NewTabHangShown', { isInitializing, isDraftReady, isUserHydrated, browser, version }). Then we can finally measure how often this fires and confirm the SW boot regression hypothesis from toby-product-strategist (artifact 388c1db4).

Verify plan

  1. Manual repro of the real extension:

    1. cd apps/extension && pnpm install && pnpm dev
    2. In Chrome → chrome://extensions → Developer mode → Load unpacked → apps/extension/.output/chrome-mv3.
    3. Open the new tab — sanity-check that the happy path renders.
    4. Force the hang: in DevTools console, monkey-patch chrome.storage.local.get = () => {} BEFORE reload, then reload the new-tab page. Expected: blank skeleton, no UI.
    5. Apply Layer 1 fix. Expected: page transitions to "no user" → Onboarding2 mounts after the 5s timeout.
  2. Service-worker repro (closer to the wild bug):

    1. Same as above but force the SW to fail at boot: chrome://extensions → Service worker → click "Inspect" → in the SW devtools, throw an error in the SW context, or close it and prevent restart.
    2. Open a new tab. Pre-fix: blank skeleton forever. Post-fix: Onboarding2 (or App, depending on local user state) renders after 5s + console warning is logged.
  3. Playwright e2e (headed) once a build exists:

    1. Launch Chromium with --disable-extensions-except=<path> --load-extension=<path>.
    2. Navigate to chrome-extension://<id>/toby.html.
    3. Assert #root has children within 10s.
    4. Add a variant with chrome.storage.local.get stubbed to a never-resolving callback — assert that the recovery screen (Layer 2) appears instead of an empty DOM.
  4. Regression check for the bug Layer-1 doesn't reintroduce: The whole reason d68726b29 was added is to prevent returning users from briefly seeing the onboarding flow. After the fix, verify: when isUserHydrated legitimately resolves with a pre-existing user, before the 5s timeout, the gate behaves exactly as today — no flash of <Onboarding2> for returning users. The fix preserves the gate; it only adds a timeout escape for the hang case.

Defer to

backend (partial) — the proximate frontend regression is fully characterised above and a frontend-only fix solves the user-visible symptom. But the root of the hang (chrome.storage callback not firing) lives in the extension service worker / MV3 lifecycle, which sits between Chrome and our Go API. Recommend the coordinator also dispatch toby-backend-doctor to investigate:

  • service worker boot path in apps/extension/entrypoints/background.ts, especially anything that races with chrome.storage on cold start;
  • whether the Go API or extension background script is producing chrome.runtime.id-invalidating errors (e.g. CORS/CSP mismatches during a deploy that briefly invalidate live extension contexts);
  • any sentry/log evidence of MV3 SW boot failures spiking in the window matching the user complaints (toby-product-strategist artifact 388c1db4 hypothesised mid-2024 → today; my reading narrows the frontend sensitivity to post-2026-04-09, which is a useful constraint for log filtering).

If backend confirms an SW boot regression, ship Layer 1 + Layer 2 of my proposed fix regardless — the frontend should never be able to hang silently on a chrome.storage timeout, even after the SW issue is resolved. Defence-in-depth is the right posture here per the compass anchor #1 ("save and restore tabs reliably… reducing anxiety, not adding it").