A
AIOS Wiki
read-only · public mirror
Open AIOS
Wikitobytoby/00-state-of-the-project.md

Toby — State of the Project

Hand-authored·64 min read·15 sections·Last edited May 13 by agent (MCP)·View history
TL;DR

_Last updated: 2026-05-13_

TL;DR

Toby's monorepo flatten (Mar 2026) and Phase 1 onboarding cleanup shipped in extension v1.13.0 on 2026-04-14, including the new 4-hour Session Start heartbeat for retention analysis. 2026-05-12 — the blank-page reliability hotfix has SHIPPED TO PR. The first canonical warroom incident (toby/incidents/2026-05-11-blank-extension-page.md, status now closed → shipped) closed its loop: TOBY-14 bridged into _inbox/ at 04:59 UTC, toby-incident-fix-shipper opened PR #12 at 05:08 UTC (~9 minutes inbox→PR), TOBY-14 transitioned to done at 05:10 UTC. The 3-layer frontend-only fix — 5s timeouts fail-open on getUser() + getOnboarding2Draft, StuckRecoveryScreen at 8s with pre-approved copy, NewTabHangShown telemetry beacon — is now in-tree on branch warroom/2026-05-11-blank-extension-page-toby-14 (commit 06baf0f8a), including a brand-new apps/extension/app/components/StuckRecoveryScreen.tsx. 2026-05-13 — second new agent-owned surface online: toby-code-reviewer (daily 07:00 UTC cron) shipped its first review at toby/code-reviews/2026-05-13-10commits.md, back-filling the 10 most recent commits to main with 3 medium + 4 low findings (test coverage gaps on the shipped Session Start heartbeat hook, security concerns on the new .sandcastle/ agentic-workflow secret scanner + gate-skip audit trail). Three tickets are being filed by that run. This is the first time the Wave 4 auto-ship path has executed end-to-end — completing the proof of both warroom rails (high-confidence ships, medium-confidence asks for a human review pass). O1 KR1 deadline (2026-05-24) is now comfortably ahead — operator decision becomes review / merge / deploy / monitor, not implementation. Diagnosis bears repeating: root cause was an unbounded chrome.storage.local.get callback in getUser() made user-visible by commit d68726b29 (2026-04-09); backend is innocent (prod-api SHA stable since 2026-02-02; 0 5xx in last 24h); the earlier toby-product-strategist MV3-SW-boot-regression hypothesis is refuted. The new code-reviewer independently confirms d68726b29's fix is correct — the issue was scope, not correctness. Warroom dispatch path is now proven twice — the Toby Incident Response workflow (id 9b78790f-2aea-4f65-876f-53d1a114c3ae) closed its second canonical incident on 2026-05-12 (toby/incidents/2026-05-12-retention-offers-silent.md, TOBY-6) with a validated + medium verdict; the validator caught compile defects in the draft diff, returned the corrected version, and Wave 4 correctly skipped auto-ship. That same incident resolved the long-standing CancelSubscription.tsx open question (the frontend integration shipped; the funnel is structurally leaky upstream) and surfaced a new strategic finding: ~82% of cancels bypass the in-app retention modal entirely (120 cancels → 22 reasons → 1 accept in 30 days). As of 2026-05-10 the toby/strategy/ sub-folder holds the full strategic spine: the Compass (identity / axioms / anchors), the Bets rolling queue (in-flight / proposed / killed with ICE + falsifying signals), and the Q2 2026 Growth Playbook (toby/strategy/playbook.md) that bridges them. In parallel, the toby-x-strategist agent has finished its X execution surface, and the toby-blog-seo agent has shipped two blog drafts and a third pricing-inconsistency signal (TheTab claims Toby is "$9/mo" in its public comparison table — strengthens the urgency of the pricing-reality-reconcile deadline 2026-05-13). The X channel goes operational this week — first scheduled post Tue 2026-05-12, contingent on @TobyForTabs account creds being in the operator's hands. Repo is on main with uncommitted edits to CLAUDE.md and untracked docs/ai-onboarding-ideas-analysis.md, product/ideas/, and research-docs/. A brand-new top-level codebase subsystem also landed via commit 75a09e34d: .sandcastle/ — host-side orchestration for unattended agentic slice execution (separate rail from the warroom fix-shipper).

Strategic anchor (Compass + Playbook + Bets)

The toby/strategy/ folder is now the authoritative strategic surface. Three docs work together: read them in order Compass → Playbook → Bets. This dashboard defers to them; conflicts are bugs in this dashboard, not the strategy docs.

  • Identity layer — Compass (2026-05-10). Vision, axioms (visual tangibility / ambient-not-invocational / persistence-is-product / instant clarity), anchors (reliability [at risk] / cloud-backed account / Chrome-extension ambient new-tab / free-tier viable / invisible performance cost), brand promise ("It's okay, Toby has it now.") and anti-promises (no count-lowering claim, no AI auto-organize pre-Q4 2026, no enterprise team parity, not the only way to manage tabs).
  • Quarter layer — Q2 2026 Growth Playbook (2026-05-10). Growth thesis: the under-pulled lever this quarter is reliability + activation, not acquisition — Toby's loss curve isn't a top-of-funnel problem (CWS conversion on residual traffic is ~30%, strong) but a value-realization problem (New Adopter stickiness 58.4% vs. 83-94% for every other segment). Diagnosis of current growth loop: primary CWS-organic loop is declining (page views collapsed Oct 8 2025, -83%, never recovered); secondary word-of-mouth is real but not engineerable; latent curator loop (1,848 Free-Tier Archivists, 18.6% public-share rate, 14,306 active card-share links) is the only under-pulled compounding motion this quarter. Falsification clause: if Phase 2 welcome A/B fails AND blank-page hotfix lands AND CWS install-conversion still falls 2 consecutive weeks post-rewrite, the bottleneck is structurally upstream (Chrome 133 absorbed our wedge).
  • Action layer — Bets (2026-05-10). Rolling ICE-scored queue with in-flight / proposed / validated / killed sections. Every bet declares a falsifying signal. Top 5 bets surfaced by the playbook (in ICE order): pricing-reality-reconcile (600), reliability-blank-page-fix (576) — diagnosis closed 2026-05-11; PR #12 open 2026-05-12 via warroom Wave 4; awaits review/merge/CWS deploy, cws-narrative-repair (392), public-collection-pride-loop (336, the under-pulled lever), phase-2-welcome-ab (192). 2026-05-12 — new strategic surface from warroom run #2: a retention-funnel-bypass-fix proposed-bet is implied by the 2026-05-12 retention_offers incident (~82% of cancels bypass the in-app retention modal; product-shaped, FE-orchestration problem, upstream of monetization bets). Surfaced here for operator triage into the bets queue — not yet formally scored.
  • Channel-execution layer — XX Strategy + Content Pipeline + Engagement Targets (2026-05-10). Operator-driven X channel; 26 ICP targets ranked across 6 Tier-A / 11 Tier-B / 9 Tier-C; new bot-filter rule applied (followers ≥ 100 AND account < 2025-06); hard engagement rules (no link in first reply, max 1 reply/target/day, no DM cold-pitch). Team-buyer bucket = 0 targets, surfacing back into the playbook open question on team-buyer signal; drop-decision deadline 2026-05-17.
  • Channel-execution layer — Blog/SEO (Run #2 · 2026-05-12)Blog & SEO Pipeline + two published drafts: Why You Have 80 Tabs Open (And Why That's Actually Fine) (P1 top-funnel, 2026-05-09) and OneTab Alternative: Save Tabs as Workspaces, Not URL Lists (P5 mid-funnel, 2026-05-12, "URL list → workspace" reframe anchored by OneTab's own December-2025 data-loss warning). Structured 2-week-cadence motion; 5 content pillars mirror the X strategy (P1 tab anxiety / P2 save-session ritual / P3 power-user shortcuts 🔒 rel-gated / P4 better-than-bookmarks / P5 competitor alternatives); pillar mix over 8 posts is ~3 P1 / ~2 P5 / ~1 P2 / ~1 P4 / ~1 P3-once-unblocked. Queue (post Run #2): rank-1 Bookmarks vs Tab Manager, rank-2 How to save Chrome tabs, rank-3 "I'll deal with it tomorrow", rank-4 Session Buddy alternative, rank-5 Arc browser switchers, rank-6 Public collection of the week (curator loop · O3 KR3), rank-7 Chrome 133 vs. Toby. Pipeline disowns the old breathless landing-page voice in favor of calm-specific-generous voice matching the X strategy. Drafts only — operator hand-copies to apps/landing/src/content/post/ to publish; first-publish flow is the immediate open question. Pipeline recommends holding the OneTab draft until reliability hotfix ships (it's feature-promotional and would route readers into the live bug); the foundational tab-hoarder draft can publish earlier.
  • Audience anchors — Solo Pro Power Saver (9.5% / 6,656 users / 94% weekly stickiness, revenue anchor) and Tenured Free Organizer (62.4% / 43,722 users / 86.6% weekly stickiness, durability + social proof). (see: toby/01-personas.md, toby/strategy/compass.md)

Operations — Incident warroom (proven 2026-05-11 · proven again 2026-05-12)

A four-agent AIOS workflow sits behind toby/incidents/ to triage and propose fixes for any reported Toby bug. The warroom proposes — the operator ships. Source of truth: toby/incidents/README.md. Two canonical incidents now closed — see toby/incidents/2026-05-11-blank-extension-page.md (high-confidence) and toby/incidents/2026-05-12-retention-offers-silent.md (medium-confidence; Wave 4 correctly skipped auto-ship).

  • WorkflowToby Incident Response (id 9b78790f-2aea-4f65-876f-53d1a114c3ae). Active, project=toby. Trigger: daily cron 0 9 * * * (09:00 UTC). Operator manual tick remains as an escape hatch. The old file_event on _inbox/** description was inaccurate — the workflow never had a file_event trigger; it ran on cron + read inbox files inside Wave 0.
  • Teamtoby-incident-coordinator (warroom commander, only agent that writes canonical docs), toby-frontend-doctor (Playwright + apps/extension/landing/mobile), toby-backend-doctor (GCP logs + read-only prod DB + apps/api), toby-incident-validator (re-runs spot-checks + triple-check; verdict gate), toby-incident-fix-shipper (added 2026-05-12) — last-mile patcher that creates a PR in axiomzen/toby-mono-repo when validator returns validated + high confidence; operates inside a /tmp/ worktree so the primary checkout is never touched.
  • Protocol — 7 waves: Wave 0 (pick: inbox → labeled queue → discernment sweep) → Wave 1 Investigate (doctors in parallel) → Wave 2 Synthesise (coordinator drafts root-cause + fix diff + verify plan) → Wave 3 Validate (verdict: validated / rejected / conditional; up to 2 retry passes on rejection) → Wave 4 Ship (fix-shipper opens a PR — ONLY on validated + high confidence; medium confidence routes the patch to the canonical doc for human review instead) → Wave 5 Transition (records solved / in_review / blocked on the source ticket) → Wave 6 Report (coordinator posts a consolidated Slack message to #C0B3FN70MEE with verdict, root cause, ticket outcome, and PR URL or decline reason; skipped on no-op runs).
  • Proof-of-life #1 — 2026-05-11 blank-extension-page (SHIPPED to PR 2026-05-12). Dispatched 17:08, frontend finding 17:14, backend finding 17:30, validator 17:36, published 17:38 — diagnosis end-to-end in ~30 minutes with validated / high confidence verdict. Root cause pinned to apps/extension/app/state/accessors/user.tsx:45-50 (unbounded chrome.storage.local.get callback), proximate trigger commit d68726b29 (2026-04-09); 3-layer frontend-only fix proposed (5s getUser timeout, 8s recovery screen with pre-approved "Your tabs are safe. Tap to recover." copy, NewTabHangShown telemetry beacon); prior toby-product-strategist MV3-SW-boot-regression hypothesis refuted by independent backend evidence. Wave 4 closed the loop 2026-05-12: TOBY-14 bridged at 04:59 UTC, toby-incident-fix-shipper opened PR #12 at 05:08 UTC (~9 min inbox→PR) on branch warroom/2026-05-11-blank-extension-page-toby-14 (commit 06baf0f8a, base 75a09e34d). Files: apps/extension/app/state/accessors/user.tsx, apps/extension/app/hooks/useOnboarding2Draft.ts, apps/extension/app/containers/Toby.tsx, and new component apps/extension/app/components/StuckRecoveryScreen.tsx. Ship-result artifact: artifacts/toby-incident-fix-shipper/b3400d87-0830-4f89-bb70-4c3907c085f1/ship-result.md. TOBY-14 transitioned to done at 05:10 UTC. Ingestion summaries: artifacts/toby-pm/7a4c4afb-12dc-419f-808f-0e9a014417cd/incidents-2026-05-11-blank-extension-page-ingestion.md (diagnosis close) and artifacts/toby-pm/bdbce617-3091-43c4-9c01-20c16b19946c/incidents-2026-05-11-blank-extension-page-ship-update-ingestion.md (ship update).
  • Proof-of-life #2 — 2026-05-12 retention_offers-silent (TOBY-6). Closed ~one day after warroom v1.0; Wave 0 discernment sweep opted-in TOBY-6 (urgent, no agent owner, cross-cutting). Both doctors converged: backend disconfirmed the regression hypothesis (prod-api SHA stable since 2026-02-02; only insert site exercised correctly; no kill-switch / silent flag); frontend pinned the funnel-bypass paths. Coordinator drafted a Tier 1 instrumentation patch (10-LOC zap structured log in GetRetentionOffer at apps/api/context/v3/subscription_context.go ~L598). Validator returned validated + medium — caught compile defects in the draft diff (log.Infoctx.Logger.Info, missing nil-guard, team.ID not in scope, struct-path corrections) and produced the corrected compile-ready replacement. Wave 4 correctly skipped auto-ship: per spec, medium-confidence verdicts route the corrected patch into the canonical doc for human Go-reviewer sign-off rather than opening a PR. TOBY-6 transitioned to in_review. The gate works in both directions — the 2026-05-11 high-confidence run shipped via the fix-shipper; this medium-confidence run correctly asked for a human pass. Ingestion summary: artifacts/toby-pm/00036a80-5931-405a-85ab-1e39ee3a545f/incidents-2026-05-12-retention-offers-silent-ingestion.md.
  • How to filefile a Toby ticket with labels: ["needs-warroom"] (loose body shape: symptom / reproduce / when / anything-noticed). The ticket→warroom bridge (lib/tickets.tsbridgeWarroomIfNeeded) writes an inbox file at toby/incidents/_inbox/YYYY-MM-DD-<ticket-id>-<slug>.md and stamps the ticket warroom-bridged. The workflow's next 09:00 UTC tick picks it up. Manual ticks via Workflows app remain an operator-only escape hatch for emergencies that can't wait one day; hand-dropping files into _inbox/ is deprecated.
  • Hard guarantees — agents NEVER apply patches except via the fix-shipper on validated + high-confidence; agents NEVER touch _inbox/; agents NEVER delete docs. On medium / conditional / rejected verdicts, the doc carries a fix proposal as a diff — operator review decides whether to ship.

Operations — Code review (online 2026-05-13)

New agent-owned operational surface — daily walk of commits to main with a triple-check skill (correctness / quality / security). The reviewer proposes — the operator (or follow-up tickets) ships. Source of truth: outputs in toby/code-reviews/.

  • Agenttoby-code-reviewer (slug da1e2bb3-fbd9-454a-9848-4fe4e05089e3). Active, project=toby. Trigger: daily cron 0 7 * * * (07:00 UTC). Walks new commits to main since the last reviewed SHA; back-fills the most recent N commits on first run.
  • Outputs — (1) a roll-up review doc at toby/code-reviews/YYYY-MM-DD-Ncommits.md containing "What shipped" (commit list) + "Findings" (severity × dimension, with code excerpts, suggested actions); (2) tickets filed against any finding rising to "real bug / security hole / missing test" threshold (improvement / bug / issue kind).
  • Hard guarantees — reviewer NEVER patches the codebase. Even high-confidence findings exit as tickets, not PRs. Distinct from the warroom fix-shipper (which DOES open PRs but only on validated + high-confidence warroom incidents).
  • Proof-of-life #1 — 2026-05-13 first review (10 commits back-fill). Run e5abf485-212b-4a2a-946e-882c6b5b22ec started 2026-05-13T07:00:18Z. First doc: toby/code-reviews/2026-05-13-10commits.md. Covered window: 10 commits on main ending at 75a09e34d (the sandcastle feat-commit). 3 medium + 4 low findings, with the three mediums each earning a filed ticket:
    • medium · qualityuseSessionStart.ts:1-75 (commit 0f3aa38) has no test coverage on a 75-line hook driving a production analytics event. Phase 1 ship debt — should not gate Phase 2 but earns its own ticket.
    • medium · security.sandcastle/scan-secrets.mts:1-182 (commit 75a09e3) is the only thing standing between an unattended LLM agent and a pushed credential — no test coverage on the patterns or the skip-list. Single-pattern regex, single-line, no multi-line obfuscation detection. Author's own comment acknowledges the limitation.
    • medium · security.sandcastle/main.mts:602,648 (commit 75a09e3) — two env-var escape hatches (SANDCASTLE_SKIP_GATES_VERIFY=1, SANDCASTLE_SKIP_SECRET_SCAN=1) fully disable verification + secret-scan gates with only a console.warn. No audit trail ties a pushed branch back to which gate was waived.
    • Lower-severity findings: Slack backtick-fence-break formatting hazard on the CWS review monitor (b9bea18), correctness flag on EXPECTED_GATES = ['lint','build'] (typecheck + test intentionally omitted; lock-step risk when TS-clean lands), trust-boundary note on agent-controlled gh pr create --title, and a doc-only note on d68726b29 (the AuthWrapper hydration fix — review independently confirms the fix is correct, agreeing with the warroom finding that the issue was scope, not correctness).
    • Ingestion summary: artifacts/toby-pm/20dc862a-fd5c-4eba-a6e1-94f0300bd1e5/code-reviews-2026-05-13-10commits-ingestion.md.
  • Cross-surface intelligence — independent triangulation against the warroom: the code-reviewer reviewed d68726b29 and reached the same conclusion (fix is correct) from a different angle (test-coverage lens, not downstream-blast-radius lens). The two reviews don't conflict; they cover orthogonal dimensions on the same commit. This is a useful pattern — different agents converging on the same artifact strengthen the dashboard's confidence in shared findings.

Q2 2026 OKRs (7 weeks remaining)

Sourced directly from toby/strategy/playbook.md. Quarter ends 2026-06-30.

  • O1 — Stop the bleed: restore CWS-rank trajectory and review average.

    • KR1: Ship blank-page reliability hotfix by 2026-05-24 — zero new 1-star "blank screen" reviews within 14 days. Diagnosis closed 2026-05-11; PR opened 2026-05-12 via warroom Wave 4 (toby/incidents/2026-05-11-blank-extension-page.md, TOBY-14 → done, PR #12, commit 06baf0f8a). Operator decision now: review, merge, build + CWS-deploy new extension version, then watch the 14-day telemetry window for NewTabHangShown and new 1-star reviews. Cross-confirmation 2026-05-13: code-reviewer independently confirms commit d68726b29 (the proximate trigger of the bug) is itself correct — the issue was the gate widening without bounding its new dependency, exactly what PR #12 fixes.
    • KR2: Publish rewritten CWS listing (title + description + social proof + CWV benchmark) by 2026-06-01 — target +20% install-conversion in the 4-week window.
    • KR3: CWS rolling 30-day review average stops declining (week-over-week non-negative) by 2026-06-30.
  • O2 — Earn the activation moment: prove (or kill) the welcome A/B and lift D7.

    • KR1: Phase 2 welcome A/B fully instrumented and live at canary 5% by 2026-05-19. Baseline today: 0 commits — silent-slip risk.
    • KR2: At decision review 2026-05-26, at least one variant ≥34% D7 retention at n≥2,000/arm. Baseline: 32.92% V2-only.
    • KR3: New Adopter persona weekly stickiness rises from 58.4% to ≥65% by 2026-06-30.
  • O3 — Find a price anchor that holds, and activate the dormant curator loop.

    • KR1: pricing-reality-reconcile complete by 2026-05-13 — one short doc, single authoritative number. Strengthened 2026-05-12: the blog pipeline surfaced a third inconsistent price (TheTab claims Toby is "$9/mo" in its public comparison table) — now three different prices across internal modeling, the Efficient.app listing, and a competitor's blog. Audit must cover all three.
    • KR2: role-based-paywall-gating design doc shipped (not built) by 2026-06-15 — defines which team / admin / sharing features move behind paid and which stay free.
    • KR3: 4 public collections featured (X + blog) by 2026-06-30. Target: 10 by end of Q3. toby-blog-seo has the recurring "Public collection of the week" curator series queued at rank-6 in toby/blog/pipeline.md — direct lever for this KR.
  • Implicit cross-OKR finding (2026-05-12) — the warroom's second incident exposed a structural retention-funnel leak (~82% of cancels bypass the in-app retention modal; ~0.83% accept-rate on 30d). The Tier 2 follow-ups (hide Stripe-portal View link / configure flow_data redirect / give legacy + basic users an in-app cancel CTA) are upstream of monetization bets but not currently scored against any O3 KR. Surface to the strategist agent for the next playbook iteration; do not retrofit the existing OKR set this quarter.

  • Implicit cross-OKR finding (2026-05-13) — the code-reviewer's first review surfaced two security-flavored findings against the new .sandcastle/ subsystem (commit 75a09e3) that aren't tied to any current OKR but matter because sandcastle is now a live path that opens PRs against main. If any agent-authored slice slips a credential past the regex scanner, OR if a CI runner exports SANDCASTLE_SKIP_*=1 without an audit trail, the autonomy story is materially weaker. Not blocking Q2 — surface as Open Questions; audit-log work could fit into engineering-hygiene scope alongside the housekeeping Tier 4 retention-secrets ticket.

Immediate next steps

Ordered by playbook ICE + dependency. Reliability and pricing are upstream of everything else.

  • Pricing reality audit (1h) — ICE 600, blocks role-based-paywall-gating. Cross-check CWS listing, gettoby.com, Stripe price IDs in production vs. the internal $4.50/mo modeling input; reconcile against public $6/$10 listing on Efficient.app and the $9/mo claim in TheTab's public comparison post (surfaced this run via the blog pipeline). OKR: O3 KR1, due 2026-05-13 — owner: TBD. Side benefit: also unblocks blog-pipeline price-claim guardrail (currently no blog post may mention price; pipeline complies) (from: toby/strategy/playbook.md, toby/strategy/bets.md#pricing-reality-reconcile, toby/blog/pipeline.md open questions + competitor-blog watch)
  • Anchor #1 protection — Reliability hotfix review + merge + deploy — ICE 576. PR OPEN 2026-05-12 — PR #12 (branch warroom/2026-05-11-blank-extension-page-toby-14, commit 06baf0f8a). The 3-layer frontend-only fix specced in toby/incidents/2026-05-11-blank-extension-page.md is now in-tree: 5s timeouts fail-open on getUser() (apps/extension/app/state/accessors/user.tsx) and getOnboarding2Draft (apps/extension/app/hooks/useOnboarding2Draft.ts); StuckRecoveryScreen at 8s (apps/extension/app/containers/Toby.tsx + new component apps/extension/app/components/StuckRecoveryScreen.tsx); NewTabHangShown telemetry beacon. Operator steps: (1) review + merge PR #12 — validator already vetted race-safety, regression-safety against d68726b29, and copy; code-reviewer also independently confirms d68726b29's fix is correct; CI is the canonical typecheck/lint gate since no local check ran in the ephemeral worktree; (2) build a new extension version + push to Chrome Web Store; (3) monitor: 7-day baseline establishment on NewTabHangShown in Amplitude, watch CWS reviews for new 1-star "blank screen" complaints (target: zero in the 14-day window). No prod-api redeploy (both backend doctor and validator agreed — API is innocent). Side benefit: unblocks the blog pipeline's P3 power-user posts AND the recommended publish of the OneTab Alternative blog draft (currently rel-gated by toby-blog-seo). OKR: O1 KR1, due 2026-05-24 — owner: TBD (likely Jad given proximate commit ownership) (from: toby/incidents/2026-05-11-blank-extension-page.md PR shipped, toby/strategy/compass.md anchor 1, toby/strategy/bets.md#reliability-blank-page-fix, toby/blog/pipeline.md P3 gate + OneTab gate recommendation, toby/code-reviews/2026-05-13-10commits.md d68726b finding)
  • Tier 1 retention instrumentation — Go-reviewer sign-off (2026-05-12) — validator returned validated + medium; Wave 4 correctly skipped auto-ship. The corrected compile-ready 10-LOC zap log line at apps/api/context/v3/subscription_context.go ~L598 (between eligibility evaluation and response build) is in toby/incidents/2026-05-12-retention-offers-silent.md. Any owner of apps/api/context/v3/ can approve. Patch is logging-only, additive, no behavior change, no schema change, 1-commit rollback. Once shipped, wait 24-48h then run the Cloud Logging query in the verify plan; expect 1-3 retention_offer_eligible events in the window. Source ticket TOBY-6 sits in in_review against this (from: toby/incidents/2026-05-12-retention-offers-silent.md fix tier 1)
  • File Tier 2/3/4 retention follow-up tickets (2026-05-12) — incident doc recommends filing now with stub bodies pointing back to the canonical doc; backfill numbers in 14 days when Tier 1 telemetry flows. Tier 2 = product/FE work (hide Stripe-portal View link or configure flow_data redirect; legacy+basic in-app cancel CTA; investigate retention_yearly 0%-accept branch). Tier 3 = schema/analytics (add retention_offer_views table or status/offered_at/declined_at columns; wire Amplitude RETENTION_OFFER_SHOWN / DECLINED into BI). Tier 4 = housekeeping (5 missing TOBY_RETENTION* secrets in GCP Secret Manager; either create with current defaults or remove lookup). Tier 2 is the bigger monetization lever than Tier 1 (from: toby/incidents/2026-05-12-retention-offers-silent.md fix tiers 2-4)
  • Code-reviewer's three auto-filed tickets — operator triage (NEW 2026-05-13) — the toby-code-reviewer run filed three tickets on its first pass: (1) useSessionStart hook needs unit-test coverage (improvement, medium — Phase 1 ship debt on the Session Start heartbeat hook; quiet-failure risk goes invisible in BQ until backfill); (2) .sandcastle/scan-secrets.mts needs unit-test coverage (improvement, medium — the only thing between an unattended LLM and a pushed credential has no tests on its patterns or skip-list); (3) sandcastle skip-flags need audit logging (issue, low — SANDCASTLE_SKIP_GATES_VERIFY=1 and SANDCASTLE_SKIP_SECRET_SCAN=1 waive gates with only a console.warn). Operator should triage priority + owner; (1) is straightforward unit tests, (2) + (3) are the autonomy story for .sandcastle/ (from: toby/code-reviews/2026-05-13-10commits.md filed tickets)
  • CWS narrative-repair sprint — ICE 392. Retitle to Toby — Tab Manager: Save Sessions, Cloud Sync & Notes, rewrite description with explicit cloud-sync mention, surface enterprise social proof, publish Core Web Vitals benchmark. Framing must NOT lean on cloud-sync as differentiator. OKR: O1 KR2, due 2026-06-01 — owner: TBD (from: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/bets.md#cws-narrative-repair)
  • Public-collection pride loop — ICE 336, the under-pulled growth lever this quarter. Surface and reward Free-Tier Archivist creators via X "public collection of the week" + curator-spotlight slot on gettoby.com. Zero engineering. toby-blog-seo carries the recurring "Public collection of the week" blog series at queue rank-6 — pairs cleanly with the X surface. OKR: O3 KR3 — owner: TBD (likely toby-x-strategist + toby-blog-seo) (from: toby/strategy/playbook.md, toby/strategy/bets.md#public-collection-pride-loop, toby/blog/pipeline.md queue rank 6)
  • Blog publish flow — operator decision. The agent writes drafts into the wiki at toby/blog/; the production blog lives at apps/landing/src/content/post/. Three sub-decisions stacked: (1) hand-copy approved drafts into the codebase repo or change the publish flow? (2) image hand-off — neither draft has a cover image; existing posts use ~/assets/images/blog/<post-folder>/<image>.png; the OneTab draft specifically needs a side-by-side "URL list vs visual collection" hero (the image carries the post's core claim); (3) confirm internal-link URL shape on gettoby.com before any inter-post link ships. Blocks: first publish of either of the two drafts in toby/blog/ (from: toby/blog/pipeline.md open questions)
  • OneTab Alternative draft — reliability-gate decision. The toby-blog-seo agent explicitly recommends holding publish until the 3-layer reliability hotfix ships (O1 KR1, 2026-05-24) — the OneTab draft is feature-promotional (Save Session + new tab page) so publishing it pre-fix would drive switchers directly into the live blank-page bug. Foundational "Why 80 Tabs Open" draft can publish earlier (less feature-promotional). Operator owns the call; treating as a recommendation, not a hard rule (from: toby/blog/pipeline.md operator-decisions section, toby/blog/onetab-alternative.md editor notes, toby/incidents/2026-05-11-blank-extension-page.md)
  • Distribution-loop decisions — blog drafts + X anchors. Two parallel calls: (A) Foundational draft + X Post 1 — both state "47 tabs is not a personality flaw"; option (a) X Post 1 links to the blog post (deep distribution, requires blog publish first), (b) treat as parallel statements of the same idea, no inter-link (X ships Tue 2026-05-12 regardless). Recommend (b) if publish-flow can't resolve by Tue 2026-05-12 morning. (B) OneTab draft + new X anchor — the pipeline doc explicitly calls out that the OneTab post needs its own dedicated X anchor (the Post 1 pair was already claimed). Operator should queue a new X post draft on the "URL list vs your work" angle once publish timing is known (from: toby/blog/pipeline.md open questions, toby/blog/onetab-alternative.md editor notes, toby/x/content-pipeline.md Post 1)
  • X channel goes live this week — operator-driven. Step 1: confirm @TobyForTabs account creds are in operator's hands (acct-gate blocker; must clear by Mon 2026-05-11). Step 2: pin Post 13 from toby/x/content-pipeline.md (🔒 acct-only, no price claim, no feature screenshots — safe ahead of pricing reconcile + rel hotfix). Step 3: post Post 1 on Tue 2026-05-12, Post 2 Wed 2026-05-13, Post 3 Thu 2026-05-14. Step 4: @nibzard Tier-A reply within 48h (thread window closes ~2026-05-12). Fallback: if @nibzard window closes before reply ships, pivot Tier-A lead to @airplanestar_ (heavy-organiser, 5.5k followers) or @wayne_effect (lost-tab-grief canonical voice). Zero engineering — pure operator execution (from: toby/x/engagement-targets.md, toby/x/content-pipeline.md)
  • Phase 2 welcome A/B at canary 5% by 2026-05-19 — ICE 192. Confirm Core Services /v1/experiments endpoint shape, register onboarding-welcome-v1, ship Slices 1–2 (prefetchWelcomeExperiment + new hook + withExperimentProps). OKR: O2 KR1, due 2026-05-19 — owner: Jad (from: tasks/phase2-todo.md, toby/strategy/bets.md#phase-2-welcome-ab)
  • Publish "Chrome 133 vs. Toby" comparison page targeting Perplexity / ChatGPT / Claude recommendation flows — Should bet. Lean on axioms 1+2+3 (visual tangibility, ambient surface, persistence), NOT cloud sync. Now also queued by toby-blog-seo at rank-7 in the blog pipeline — owner: TBD (from: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/bets.md#chrome-133-vs-toby-comparison-page, toby/blog/pipeline.md queue rank 7)
  • Founder-level decision: Atlassian/Dia partnership/context-API discovery meeting — explicitly deferred this quarter by playbook anti-bet (one-way door; decision belongs at founder level + after the v3 research dossier's Sept-2026 check-in). Re-raise post Q2 — owner: founder (from: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/playbook.md anti-bets)

Phase / Milestone progress

  • Phase 1 — Session Start + 1.13.0 cleanup: SHIPPED. Extension 1.13.0 published 2026-04-14; signup-position A/B experiment removed (ship "end" variant); AuthWrapper gated on user hydration; experimentEntityId race fixed; 4h Session Start heartbeat live with 60k/day projection and 180k/day halt threshold (commits 0f3aa38d2, bc5e45305, d68726b29, cde22c935, 9d6e8e4f3). 2026-05-13 review surfaced ship-debt: useSessionStart.ts hook has no test coverage on its 4h-rate-limit / days_since_signup parse / storage-write-ordering logic — quiet-failure risk goes invisible until BQ backfill. Ticket filed (see: toby/code-reviews/2026-05-13-10commits.md useSessionStart finding).
  • Phase 2 — Welcome A/B + shared onboarding spine: PLANNED, not yet in flight. 12 slices specced with halt triggers; canary stages 5%→20%→50%→100%; decision review 2026-05-26. Playbook O2 KR1 sets a hard date: canary 5% live by 2026-05-19 or O2 is at risk (see: tasks/phase2-todo.md, tasks/onboarding-experiment-plan.md, toby/strategy/playbook.md OKRs)
  • CWS Review Monitor (Cloud Run job): SHIPPED + reliability-hardened. AI-drafted responses live since 2026-03-30 (ba247d9ac); fallback Slack message added 2026-04-29 (commit b9bea18cd). 2026-05-13 review surfaced a low-quality formatting hazard on the fallback message: escapeSlackMrkdwn doesn't strip backticks, so a draftErr.Error() echoing a triple-backtick body breaks the Slack fence. Doc-only, no ticket (see: toby/code-reviews/2026-05-13-10commits.md b9bea18 finding).
  • Retention discount — frontend integration CONFIRMED SHIPPED (via 2026-05-12 warroom diagnosis). Backend live since cbc92a78d; FE dispatch site is live at apps/extension/app/components/Modal/Downgrade/CancelSubscription.tsx:643-709; RETENTION_OFFER_DECLINED Amplitude wiring live at :622-627. The previous worklog.md "pending" entry from Jan 2026 is stale — the integration shipped at some point between then and 2026-05-12. The funnel works for users who reach the in-app retention modal (17 all-time accepts, 16 retention_legacy + 1 retention_yearly) — what's broken is structural: ~82% of cancels bypass the modal entirely. See open question on retention-funnel structural leak (see: toby/incidents/2026-05-12-retention-offers-silent.md)
  • Monorepo flatten + Turborepo / pnpm workspaces migration: SHIPPED late March 2026 — apps/api, apps/extension, apps/landing, apps/mobile (commits 87bec6267, a90230ce1, 134f9bb90, ec843c5a2, c5545cbd5, 2574b5379, 5bd961266)
  • Auto-generated architecture docs (docs/architecture/_index.yaml, flows, Mermaid diagrams): SHIPPED Jan 2026 — 21 controllers, 86 endpoints, 70 mutations, 111 tracked events parsed (see: worklog.md, CLAUDE.md)
  • Strategic research v3 (Phase 0.5–9 pipeline) + delta vs v2: SHIPPED 2026-05-05 (see: research-docs/toby-research-2026-05-05-v3.md, research-docs/toby-delta-2026-05-05-v3.md)
  • Strategy spine — SHIPPED to wiki 2026-05-10. Three co-equal docs: Compass (identity / axioms / anchors), Bets (rolling ICE-scored queue), Q2 2026 Playbook (growth thesis, OKRs, anti-bets, red-team).
  • X execution surface — SHIPPED to wiki 2026-05-10. Three docs at toby/x/: strategy.md, content-pipeline.md, engagement-targets.md. Channel goes operational Tue 2026-05-12 (first scheduled post) contingent on operator confirming @TobyForTabs creds by Mon 2026-05-11.
  • Incident warroom — STOOD UP 2026-05-11 + PROVEN ON BOTH RAILS. New folder toby/incidents/ plus README.md. AIOS workflow Toby Incident Response (id 9b78790f-2aea-4f65-876f-53d1a114c3ae) active; five agents on roster (coordinator + frontend doctor + backend doctor + validator + fix-shipper). Two canonical incidents now both have ship-outcomes: toby/incidents/2026-05-11-blank-extension-page.md (high-confidence, ~30 min diagnosis + Wave 4 fix-shipper opened PR #12 on 2026-05-12 — first end-to-end auto-ship) and toby/incidents/2026-05-12-retention-offers-silent.md (medium-confidence — Wave 4 correctly skipped auto-ship; Tier 1 patch in canonical doc awaiting human Go-reviewer sign-off). Both rails now exercised: high-confidence ships, medium-confidence routes for human review.
  • Reliability blank-page hotfix — PR OPEN 2026-05-12. PR #12 against axiomzen/toby-mono-repo:main (branch warroom/2026-05-11-blank-extension-page-toby-14, commit 06baf0f8a on base 75a09e34d). 3-layer frontend-only fix from the 2026-05-11 incident doc lands as: edits to apps/extension/app/state/accessors/user.tsx (Layer 1 5s getUser() timeout), apps/extension/app/hooks/useOnboarding2Draft.ts (Layer 1 on isReady/isDraftReady), apps/extension/app/containers/Toby.tsx (Layer 2 StuckRecoveryScreen at 8s + Layer 3 NewTabHangShown beacon), and a brand-new component apps/extension/app/components/StuckRecoveryScreen.tsx with pre-approved copy "Your tabs are safe. Tap to recover.". Deliberately NOT in PR #12 (queued as follow-ups, not blockers): Layer-1 for isInitializing, the SW-hardening trio, the Layer-1 NewTabHydrationTimeout beacon. Ship-shipper run b3400d87-0830-4f89-bb70-4c3907c085f1; source ticket TOBY-14 (status done). (see: toby/incidents/2026-05-11-blank-extension-page.md PR-shipped section, artifacts/toby-pm/bdbce617-3091-43c4-9c01-20c16b19946c/incidents-2026-05-11-blank-extension-page-ship-update-ingestion.md)
  • Blog & SEO pipeline — STOOD UP 2026-05-12 (Run #1) + RUN #2 SHIPPED 2026-05-12. Folder toby/blog/ now contains the pipeline doc + two drafts. Run #2 also migrated legacy flat toby/blog-*.md files into the sub-folder (structural move, no content edits). Agent toby-blog-seo produces toby/blog/pipeline.md (2-week cadence, 5 pillars mirroring X, 7 ranked queued topics, explicit NOT-pursued list, voice fingerprint anchored to toby/strategy/compass.md and the X strategy) and drafts as the cycle rolls. Shipped drafts: toby/blog/why-you-have-so-many-tabs-open.md (P1 top-funnel, 2026-05-09, paired with X Post 1) and toby/blog/onetab-alternative.md (P5 mid-funnel, 2026-05-12, "URL list → workspace" reframe). Agent writes drafts only — operator hand-copies to apps/landing/src/content/post/ to publish. All existing guardrails honored: no price claims, P3 rel-gated, no AI pre-announce, no holiday filler, cordial silence on competitors. Ingestion summaries: artifacts/toby-pm/ea1223d6-…/blog-pipeline-ingestion.md (Run #1) + artifacts/toby-pm/bf6972e6-…/blog-pipeline-run2-ingestion.md (Run #2).
  • Code-review channel — ONLINE 2026-05-13 (Run #1). New folder toby/code-reviews/ with agent toby-code-reviewer on a daily 0 7 * * * cron. First doc: toby/code-reviews/2026-05-13-10commits.md — back-fill of the 10 most recent commits on main (window ends at 75a09e34d). 3 medium + 4 low findings; the three mediums each filed a ticket: useSessionStart no-tests (Phase 1 ship debt), .sandcastle/scan-secrets.mts no-tests (LLM-credential-safety net has no validation), sandcastle skip-flags audit-trail missing. Reviewer NEVER patches — it files tickets only (orthogonal to the warroom fix-shipper). Ingestion summary: artifacts/toby-pm/20dc862a-fd5c-4eba-a6e1-94f0300bd1e5/code-reviews-2026-05-13-10commits-ingestion.md.
  • Sandcastle agentic-workflow subsystem — SHIPPED 2026-05-13 via commit 75a09e34d. New top-level codebase artifact at .sandcastle/ (orchestrator main.mts, secret scanner scan-secrets.mts). Host-side, runs lint + build gates and an added-lines secret regex, then opens a PR against main via gh pr create. Two security-flavored open questions (see below) from the code-reviewer's first review: no test coverage on the secret patterns / skip-list; no audit trail when env-var escape hatches are set. This is separate infrastructure from the warroom fix-shipper — sandcastle ships agent-authored slices, fix-shipper ships warroom-validated incident patches.

Roadmap

Next 2 weeks

  • Pricing audit complete by 2026-05-13 (O3 KR1) — single authoritative number doc. Audit must now cover three inconsistent prices: internal $4.50/mo, public $6/$10 Efficient.app listing, and TheTab's $9/mo claim (surfaced via toby-blog-seo competitor-blog watch). Also unlocks the blog pipeline's price-claim guardrail (see: toby/strategy/playbook.md, toby/blog/pipeline.md open questions + competitor blogs)
  • X channel goes operational Tue 2026-05-12 — first scheduled post (Post 1 from toby/x/content-pipeline.md); @nibzard Tier-A reply within 48h; @TobyForTabs creds-gate must clear by Mon 2026-05-11 first (see: toby/x/engagement-targets.md)
  • Blog channel goes operational this cycle — two drafts now written; path to first publish is the operator publish-flow decision (hand-copy to apps/landing/src/content/post/ vs. flow change), image hand-off (OneTab draft specifically needs a "URL list vs visual collection" hero), and internal-link URL convention confirmation. Recommend targeting Tue 2026-05-12 for the foundational tab-hoarder post (distribution-loop alignment with X Post 1); hold the OneTab Alternative draft until the reliability hotfix ships per toby-blog-seo recommendation (see: toby/blog/pipeline.md, toby/blog/onetab-alternative.md editor notes, toby/x/content-pipeline.md Post 1)
  • Reliability hotfix merge + CWS deploy by 2026-05-24 (O1 KR1) — PR #12 is open. Operator path: review → merge → bump extension version → CWS-deploy → 7-day Amplitude baseline on NewTabHangShown + 14-day CWS-review watch for "blank screen" 1-stars (target: zero). Unblocks blog pipeline P3 posts AND unlocks publish of the OneTab Alternative draft as side effects (see: toby/incidents/2026-05-11-blank-extension-page.md PR shipped, toby/strategy/compass.md anchor 1, toby/blog/pipeline.md P3 gate + OneTab gate)
  • Tier 1 retention-instrumentation Go-reviewer sign-off (2026-05-12) — corrected 10-LOC zap log patch is in the canonical incident doc; awaiting one human review pass from a Toby Go reviewer (any owner of apps/api/context/v3/). After merge + deploy, wait 24-48h then run the Cloud Logging query in the verify plan and file Tier 2/3/4 follow-ups with real numbers attached (see: toby/incidents/2026-05-12-retention-offers-silent.md)
  • File Tier 2/3/4 retention follow-up tickets (2026-05-12) — file now with stub bodies pointing back to the canonical doc to prevent work-getting-lost. Tier 2 is the bigger monetization lever (hide Stripe-portal link, configure flow_data redirect, give legacy users an in-app cancel CTA, investigate why retention_yearly accepts ~0% of non-legacy users) (see: toby/incidents/2026-05-12-retention-offers-silent.md)
  • Triage the three code-reviewer auto-filed tickets (NEW 2026-05-13) — assign owners, set priorities. (1) useSessionStart unit tests is the cheapest win and lives on Phase 1 already-shipped surface; (2) .sandcastle/scan-secrets.mts tests are the autonomy-story gate — should be filed before any sandcastle-authored slice opens its first real PR; (3) sandcastle audit-log is the partner to (2) — both are cheap, both close a non-zero-risk gap that opens whenever an agent has push access (see: toby/code-reviews/2026-05-13-10commits.md filed tickets)
  • Team-buyer X-pillar drop decision by 2026-05-17 — engagement-targets bucket recap is still zero candidates. Operator must confirm whether to drop the pillar from X entirely or supply target shape (see: toby/x/engagement-targets.md bucket recap)
  • Phase 2 welcome A/B at canary 5% by 2026-05-19 (O2 KR1) — Slices 1–2 (experiment plumbing + withExperimentProps across 17 ONBOARDING_V2_* sites). Decision review 2026-05-26. Per playbook red-team residual, if 2026-05-19 arrives with no Phase 2 commits, O2 is dead — operator escalation (see: tasks/phase2-todo.md, toby/strategy/playbook.md)
  • CWS narrative-repair sprint kickoff (O1 KR2 work begins) — retitle, description rewrite, enterprise social proof, CWV benchmark (see: research-docs/toby-delta-2026-05-05-v3.md)

Next month

  • CWS listing rewrite published by 2026-06-01 (O1 KR2) — measure +20% install-conversion over 4 weeks (see: toby/strategy/playbook.md, toby/strategy/bets.md#cws-narrative-repair)
  • Blog cadence rolls — at 2-week cadence with two drafts shipped 2026-05-09 and 2026-05-12, the operator should publish the next post (Bookmarks vs Tab Manager · queue rank-1) on ~2026-05-26 and the post after that (How to save Chrome tabs · queue rank-2) on ~2026-06-09; once reliability ships, the first P3 post enters the queue (see: toby/blog/pipeline.md queue)
  • Code-review cadence rolls — daily 07:00 UTC. Each new commit landing on main (whether human-authored, sandcastle-authored, or warroom-fix-shipper-authored) enters the review window. Expect a steady cadence of ticket-shaped findings; the dashboard will surface medium+ findings as they appear and treat the doc index as the rolling archive (see: toby/code-reviews/)
  • Tier 1 retention-instrumentation verify window (~14 days post-merge) — once Tier 1 ships, the Cloud Logging query in the canonical doc lets us compute the first-ever FE funnel ratio (retention_offer_eligible events vs. cancellation_reasons rows). Becomes the disambiguation between "modal renders but is declined" and "modal-never-calls-the-endpoint" — i.e. settles whether the bigger lever is in copy/offer-strength (Tier 2c) or in routing more cancels through the modal (Tier 2a + 2b) (see: toby/incidents/2026-05-12-retention-offers-silent.md verify plan)
  • Phase 2 Slices 3–10: workspace-name slide removal + inline rename nudge, real-tab fallback in SaveTabsSlide, seed-on-skip starter content, ExtensionMenuSlide demoted to showcase, auth-as-modal-overlay on blurred dashboard, new ONBOARDING_V2_OPEN_TAB event/slide (see: tasks/phase2-todo.md)
  • Phase 2 canary 5% → 20% → 50% → 100%; kill if neither variant hits 34% D7 at n≥2,000/arm at the 2026-05-26 decision review (see: tasks/phase2-todo.md, tasks/onboarding-experiment-plan.md)
  • role-based-paywall-gating design doc shipped by 2026-06-15 (O3 KR2) — defines which team/admin/sharing features move behind paid and which stay free under compass anchor #4 (see: toby/strategy/playbook.md, toby/strategy/bets.md#role-based-paywall-gating)
  • Public "Chrome 133 vs. Toby" comparison page — axioms 1/2/3, NOT cloud sync. Free version only; paid push is anti-bet. Now also queued at rank-7 in toby/blog/pipeline.md (see: toby/strategy/playbook.md anti-bets)
  • Curator loop: ≥4 public collections featured by 2026-06-30 (O3 KR3) — first featured collection ideally amplified via X + the new blog "Public collection of the week" recurring series (see: toby/strategy/playbook.md O3 KR3, toby/x/content-pipeline.md, toby/blog/pipeline.md queue rank 6)
  • Reliability follow-ups (non-blocking on the incident close): apply Layer-1 timeout shape to isInitializing (useIsRestoring() IDB path in Toby.tsx:168-275); SW hardening (.catch() on persistQueryClientRestore at background.ts:14, AbortController+10s timeout in contextMenus.ts:145-191, unified chromeStorageGet<T>(keys, { timeoutMs }) helper to replace every raw chrome.storage.local.get callsite); Layer-1 NewTabHydrationTimeout telemetry beacon so the common 5s recovery path is visible in Amplitude (see: toby/incidents/2026-05-11-blank-extension-page.md follow-ups)

Next quarter / beyond

  • Curator loop scales to 10 featured public collections by EoQ3 (see: toby/strategy/playbook.md O3 KR3)
  • Role-based feature gating built (after Q2 design doc lands) — 2× conversion benchmark on Multi-User Collaborator + Free-Tier Archivist segments (see: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/bets.md#role-based-paywall-gating)
  • MCP integration v1 as a B2B / Team-plan unlock (see: product/ideas/mcp-integration.md)
  • AI feature relaunch deferred to Q4 2026. No pre-announce — compass anti-promise + playbook anti-bet (blog pipeline explicitly honors this in its NOT-pursued list) (see: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/compass.md, toby/strategy/playbook.md, toby/blog/pipeline.md NOT-pursued)
  • Atlassian / Dia ecosystem partnership decision — explicitly deferred past Q2 by playbook anti-bet; revisit at founder level after Sept-2026 v3 falsifiable check-in (see: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/playbook.md anti-bets)
  • Falsifiable Sept 2026 check-in: blank-page closed + onboarding D1 lift + paywall restructured + Atlassian/Linear public move; otherwise pivot-or-wind-down by EOY 2026 (see: research-docs/toby-research-2026-05-05-v3.md)
  • North Star metric: Weekly Card Opens (~37k users); supporting metric: Weekly Card Saves (~9k users) (see: worklog.md)

Recent shipments

The 14d window has been broken open: PR #12 (the reliability hotfix) lands code in-tree for the first time since 2026-04-29, and two new agent-owned operational surfaces (code-review channel, sandcastle subsystem) came online this week.

  • Operational surface: Code-review channel online — new agent toby-code-reviewer (daily 07:00 UTC cron) shipped its first review at toby/code-reviews/2026-05-13-10commits.md, back-filling the 10 most recent commits to main (window ends at 75a09e34d). 3 medium + 4 low findings; three tickets filed (Session Start hook no-tests; .sandcastle/scan-secrets.mts no-tests; sandcastle skip-flags audit-trail missing). Reviewer files tickets only — never patches. Run e5abf485-212b-4a2a-946e-882c6b5b22ec; ingestion summary artifacts/toby-pm/20dc862a-fd5c-4eba-a6e1-94f0300bd1e5/code-reviews-2026-05-13-10commits-ingestion.md (2026-05-13)
  • Code: Sandcastle agentic-workflow subsystem — commit 75a09e34d lands .sandcastle/main.mts (orchestrator: lint+build gates, retry, gh pr create) and .sandcastle/scan-secrets.mts (added-lines secret regex with skip-list for lockfiles + binaries). Host-side rail for unattended agent slice execution that opens PRs against main. Separate from the warroom fix-shipper — sandcastle ships agent-authored slices, fix-shipper ships warroom-validated incident patches; both rails end in a PR but originate differently. Two security-flavored findings already filed against the subsystem by the code-reviewer (no tests on the scanner; no audit trail on env-var skip flags) (2026-05-13)
  • Code: Reliability blank-page hotfix shipped to PRPR #12 on axiomzen/toby-mono-repo, branch warroom/2026-05-11-blank-extension-page-toby-14, commit 06baf0f8a (base 75a09e34d on origin/main). 3-layer frontend-only fix from the 2026-05-11 incident: 5s getUser() + getOnboarding2Draft fail-open timeouts; StuckRecoveryScreen at 8s; NewTabHangShown telemetry beacon. Files: apps/extension/app/state/accessors/user.tsx, apps/extension/app/hooks/useOnboarding2Draft.ts, apps/extension/app/containers/Toby.tsx, and net-new apps/extension/app/components/StuckRecoveryScreen.tsx. First end-to-end exercise of Wave 4 of the warroom workflow — toby-incident-fix-shipper ran in b3400d87-0830-4f89-bb70-4c3907c085f1, source ticket TOBY-14 transitioned to done at 05:10 UTC. (2026-05-12)
  • Wiki: Second canonical warroom incident — retention_offers silent (TOBY-6)toby/incidents/2026-05-12-retention-offers-silent.md (status closed diagnosis; verdict validated + medium; Wave 4 correctly skipped auto-ship; TOBY-6 → in_review). Doctors converged: backend cleared the API (prod-api SHA stable since 2026-02-02; only insert site exercised correctly; no kill-switch / silent flag); frontend pinned three structural FE-orchestration bypass paths (Stripe-portal View link preloaded inside in-app Subscription panel, Stripe renewal-email "Manage subscription" links to the same portal, team_legacy/team_basic users with no in-app cancel CTA at all). Coordinator drafted Tier 1 instrumentation patch (10-LOC zap structured log at apps/api/context/v3/subscription_context.go ~L598); validator caught compile defects, returned corrected compile-ready replacement, returned medium confidence. Three strategic findings: (1) ~82% of cancels bypass the in-app retention modal entirely — structural funnel leak, product-shaped not regression; (2) CancelSubscription.tsx frontend integration did ship at some point, resolving the long-running open question; (3) of users who DO reach the modal, only ~5-10% accept — backend can't disambiguate "show-without-accept" from "decline" because schema is accept-only. Ingestion summary: artifacts/toby-pm/00036a80-5931-405a-85ab-1e39ee3a545f/incidents-2026-05-12-retention-offers-silent-ingestion.md (2026-05-12)
  • Wiki: Blog & SEO pipeline Run #2 + second drafttoby/blog/pipeline.md regenerated 2026-05-12 09:00 UTC with a structural migration applied (legacy flat toby/blog-*.md files moved into the new toby/blog/ sub-folder, no content edits during relocation) plus a second published draft: toby/blog/onetab-alternative.md — P5 mid-funnel, ~1,500 words, "URL list → workspace" reframe anchored by OneTab's own December-2025 troubleshooting-page data-loss warning. The draft is generous to OneTab, explicitly self-routes Workona-shaped readers, and honors the "no punch-down" anti-bet. Pipeline queue reordered (Bookmarks vs Tab Manager now rank-1; Chrome 133 vs. Toby added at rank-7). New pricing signal surfaced via competitor-blog watch: TheTab's blog claims Toby is "$9/mo" — third inconsistent price (alongside internal $4.50/mo and public $6/$10) that strengthens the urgency of pricing-reality-reconcile. Pipeline recommends holding OneTab draft publish until the reliability hotfix ships (it's feature-promotional). Ingestion summary: artifacts/toby-pm/bf6972e6-dbf0-4794-bea0-9da5e89afdd2/blog-pipeline-run2-ingestion.md (2026-05-12)
  • Wiki: Blog & SEO pipeline run #1 — new agent-owned surface at toby/blog/. Pipeline doc establishes 2-week cadence, 5 pillars mirroring X, voice fingerprint anchored to the compass, 7 ranked queued topics, explicit NOT-pursued list. First draft published as a wiki doc: Why You Have 80 Tabs Open (And Why That's Actually Fine) — P1 top-funnel, explicitly paired with X Post 1. Operator decision now owns the publish-flow handoff to apps/landing/src/content/post/ (2026-05-12)
  • Wiki: First incident closed by warroomtoby/incidents/2026-05-11-blank-extension-page.md (status closed, verdict validated high-confidence). End-to-end in ~30 min: frontend doctor pinned apps/extension/app/state/accessors/user.tsx:45-50 (unbounded chrome.storage.local.get callback); backend doctor cleared the API (prod-api SHA stable since 2026-02-02, 0 5xx, refuting the prior MV3-SW-boot-regression hypothesis); coordinator drafted a 3-layer frontend-only fix; validator confirmed race-safety and d68726b29 regression-check. Agents never patched the codebase — operator review owns the ship decision against the O1 KR1 deadline (2026-05-24) (2026-05-11)
  • Wiki: Incident warroom stood up at toby/incidents/ — folder seeded with onboarding README.md; AIOS workflow Toby Incident Response (id 9b78790f-2aea-4f65-876f-53d1a114c3ae) active; four-agent team (toby-incident-coordinator, toby-frontend-doctor, toby-backend-doctor, toby-incident-validator) on roster. Daily cron at 09:00 UTC reads _inbox/ then the ticket queue; manual ticks remain as an operator escape hatch. The earlier dashboard claim of a file_event trigger was inaccurate — corrected 2026-05-12 when the ticket→warroom bridge landed (2026-05-11)
  • Wiki: X engagement-targets list re-generated at toby/x/engagement-targets.md26 ranked ICP targets (6 Tier-A / 11 Tier-B / 9 Tier-C) with reply drafts, "what NOT to do" guardrails, and updated competitor watch list (Uncluttr, TabVault Pro, ThoughtFold, tab-out, leap-tabs.com flagged do-not-engage). New bot-filter rule applied: followers ≥ 100 AND account < 2025-06. Four new Tier-A targets added (@airplanestar_, @tropicanacailin, @wayne_effect, @benrayfield); four new Tier-B switchers added; ThoughtFold flagged as sharper threat for AuDHD-ICP overlap (2026-05-10)
  • Wiki: X content pipeline + X strategy at toby/x/content-pipeline.md and toby/x/strategy.md — operator-driven channel surface; first scheduled post Tue 2026-05-12, pin candidate Post 13 (🔒 acct-gated only) (2026-05-10)
  • Wiki: Q2 2026 Growth Playbook published at toby/strategy/playbook.md — growth thesis, current-loop diagnosis, top-5 bets, Q2 OKRs, anti-bets, red-team pass (2026-05-10)
  • Wiki: Rolling Bets Queue at toby/strategy/bets.md — ICE-scored in-flight / proposed / validated / killed with falsifying signals (resolves prior open question about bets.md location) (2026-05-10)
  • Wiki: Product Compass v0.1 at toby/strategy/compass.md — identity, axioms, anchors, brand promise, anti-promise (2026-05-10)
  • Post fallback Slack message when CWS review AI draft fails (commit b9bea18c, 2026-04-29)2026-05-13 review notes a low-severity backtick-fence-break formatting hazard in the error path; doc-only, no ticket.
  • Docs: Installation device_id data caveat + analytics troubleshooting (commit da1ba81e, 2026-04-14)
  • Chore: build:store script to build all prod extension variants (commit e7d00a8e, 2026-04-14)
  • Fix: correct fabricated "Open toby" volume baseline in Phase 1 halt threshold (commit 389556ae, 2026-04-14)
  • Feat: 4h Session Start heartbeat event for intra-day retention analysis (commit 0f3aa38d, 2026-04-14)2026-05-13 review notes no unit tests on the useSessionStart.ts hook driving this production analytics event; ticket filed (improvement, medium).
  • Docs: Phase 1 / Phase 2 specs, plans, and todos for onboarding experiment work (commit 40684905, 2026-04-14)
  • Docs: BigQuery schema, connection IDs, and data caveats added to analytics skill (commit b1fe667b, 2026-04-09)
  • Chore: extension 1.13.0 (commit 9d6e8e4f, 2026-04-09)
  • Fix: gate AuthWrapper on user hydration to prevent duplicate onboarding events (commit d68726b2, 2026-04-09)identified as the proximate trigger of the blank-page hang per toby/incidents/2026-05-11-blank-extension-page.md. The fix is correct — confirmed independently by the 2026-05-13 code-reviewer pass; the issue was that it widened the gate without bounding the new dependency. Now defended-in-depth by PR #12 (commit 06baf0f8a, 2026-05-12) — the gate is preserved; the timeouts + recovery screen prevent the unbounded-callback failure mode from killing the UI.
  • Fix: derive experimentEntityId from the draft to eliminate race condition (commit cde22c93, 2026-04-09)
  • Refactor: remove onboarding-signup-position A/B experiment, ship "end" variant (commit bc5e4530, 2026-04-09)
  • Add Chrome Web Store review monitor with AI-drafted responses (commit ba247d9a, 2026-03-30)

Key decisions to date

  • Code-review channel design — review + file tickets, never patch (2026-05-13). The new toby-code-reviewer agent walks main-commits daily, applies a triple-check skill (correctness / quality / security), writes a roll-up review doc to toby/code-reviews/, and files tickets for findings rising to bug/security/missing-test threshold. It never patches — even high-confidence findings exit as tickets, not PRs. This is orthogonal to the warroom fix-shipper, which DOES open PRs but only on validated-high-confidence warroom incidents. The dashboard surfaces both rails so operators see the two paths to main clearly. (see: toby/code-reviews/2026-05-13-10commits.md, this dashboard's Operations sections)
  • Wave 4 auto-ship path executed end-to-end for the first time (2026-05-12 via toby/incidents/2026-05-11-blank-extension-page.md PR-shipped section). toby-incident-fix-shipper ran in b3400d87-0830-4f89-bb70-4c3907c085f1, picked up TOBY-14 from the bridge, branched off 75a09e34d into warroom/2026-05-11-blank-extension-page-toby-14, applied the 3-layer diff (4 files, including new apps/extension/app/components/StuckRecoveryScreen.tsx), and opened PR #12 ~9 minutes after inbox bridge. This completes the proof of both warroom rails: 2026-05-12 medium-confidence skip + 2026-05-12 high-confidence auto-ship. (see: toby/incidents/2026-05-11-blank-extension-page.md PR shipped, toby/incidents/README.md Wave 4 spec, artifacts/toby-pm/bdbce617-3091-43c4-9c01-20c16b19946c/incidents-2026-05-11-blank-extension-page-ship-update-ingestion.md)
  • Reliability hotfix scope locked at the 3-layer fix specced in the 2026-05-11 doc (decision rendered by PR #12 contents, 2026-05-12). The fix-shipper deliberately scoped the PR to (1) Layer-1 timeouts on getUser() + getOnboarding2Draft, (2) Layer-2 StuckRecoveryScreen at 8s, (3) Layer-3 NewTabHangShown beacon. The three follow-up surfaces flagged in the incident doc (Layer-1 for isInitializing, SW-hardening trio, Layer-1 telemetry beacon) are deliberately deferred as separate work — not in PR #12. This is the design intent: minimum-blast-radius hotfix first, hardening as follow-on PRs. (see: toby/incidents/2026-05-11-blank-extension-page.md PR-shipped + Follow-ups sections)
  • Medium-confidence validator verdicts skip auto-ship by design (captured 2026-05-12 via toby/incidents/2026-05-12-retention-offers-silent.md). When the validator returns validated + medium, Wave 4 (fix-shipper) is deliberately bypassed and the corrected compile-ready diff is routed into the canonical incident doc for human review. This is the intended design — the 2026-05-11 high-confidence run proved the auto-ship path; the 2026-05-12 medium-confidence run proved the human-review-required path. The gate works in both directions. (see: toby/incidents/2026-05-12-retention-offers-silent.md, toby/incidents/README.md Wave 4 spec)
  • retention_offers table is accept-only by design — "0 issued, 0 accepted" is unmeasurable from this schema (captured 2026-05-12). The table records confirmed "CLAIM DISCOUNT" clicks; shows-without-accept and declines write nothing. Any "issued" or "shown" metric needs Tier 3 schema work (new retention_offer_views table OR status/offered_at/declined_at columns) and/or Amplitude RETENTION_OFFER_SHOWN/DECLINED events wired into BI. The Tier 1 logging patch is an interim measure that lets us measure "eligible offers shown" via Cloud Logging without a schema change. (see: toby/incidents/2026-05-12-retention-offers-silent.md root cause #1)
  • Blog/SEO motion as a structured, agent-owned pipeline (captured 2026-05-12 in toby/blog/pipeline.md). Replaces ad-hoc landing-page posts. Agent writes wiki drafts; operator publishes by hand-copy. Voice fingerprint = toby/strategy/compass.md + toby/x/strategy.md (calm, specific, generous; never breathless). Pillar mix mirrors X. P3 power-user posts are rel-gated on the blank-page hotfix. No price claims until pricing-reality-reconcile completes (O3 KR1, due 2026-05-13). No AI pre-announce until Q4 2026 relaunch. No competitor punch-down — comparison content only, never trash-talk. (see: toby/blog/pipeline.md voice + NOT-pursued sections)
  • Reliability gate also applies to feature-promotional comparison posts (captured 2026-05-12 in toby/blog/pipeline.md Run #2). The OneTab Alternative draft is rel-gated by the pipeline agent itself — recommended hold-until-hotfix-ships, since the post leans on Save Session + new tab page as the wedge and publishing pre-fix would drive switchers into the live bug. Top-funnel posts (e.g. Why 80 Tabs Open) are not similarly gated. (see: toby/blog/pipeline.md operator-decisions, toby/blog/onetab-alternative.md editor notes)
  • Blank-page reliability — no prod-api redeploy as part of this incident (2026-05-11; both backend doctor and validator concurred). Backend is innocent (prod-api SHA stable since 2026-02-02; 0 5xx; SW boot path clean; getUser() makes no network call so the hang is pre-HTTP). A redeploy is needless blast-radius. The fix lives entirely in apps/extension. (see: toby/incidents/2026-05-11-blank-extension-page.md)
  • NewTabHangShown telemetry beacon — feature-flag-gated, default on (2026-05-11; validator concurs). First-ever signal between CWS-review complaint and Sentry/Amplitude funnels. The optional Layer-1 NewTabHydrationTimeout beacon follows the same default. (see: toby/incidents/2026-05-11-blank-extension-page.md operator decisions)
  • Prior MV3-SW-boot-regression hypothesis is REFUTED (2026-05-11). Earlier toby-product-strategist artifact 388c1db4-59b7-49e9-8ec3-ecfba972c95f is now treated as historical context only; do not carry forward as a live theory. Independent backend evidence (SHA stability, 5xx volume, SW boot listener registration, network-free hang path) rules it out. (see: toby/incidents/2026-05-11-blank-extension-page.md "What this is NOT")
  • Incident warroom design — agents never patch (except via fix-shipper on validated + high-confidence) (captured 2026-05-11 in toby/incidents/README.md; reinforced 2026-05-12). The five-agent team (toby-incident-coordinator + frontend/backend doctors + validator + fix-shipper) produces canonical incident docs with fix proposals as diff. High-confidence + validated → fix-shipper opens a PR. Medium-confidence + validated → corrected diff lives in the canonical doc; operator decides. Conditional / rejected → up to 2 retry passes. Doctors write run-artifact findings; only the coordinator writes the canonical doc; validator returns a binding verdict. Agents never touch _inbox/; agents never delete docs. (see: toby/incidents/README.md, toby/incidents/2026-05-12-retention-offers-silent.md)
  • Q2 2026 anti-bets (codified in playbook 2026-05-10): (1) no pre-announce of AI organize features on any public channel; (2) no paid acquisition while CWS rank is unrecovered ($54/yr ARPU × <5% full-price conversion = math doesn't work); (3) no Firefox/Safari port this quarter — engineering throttle forces a "Phase 2 + reliability OR platform port" choice and we pick the first; (4) no public punch-down at OneTab / Workona / Session Buddy / Arc / new entrants Uncluttr / TabVault Pro / ThoughtFold / tab-out / leap-tabs (comparison content fine, trash-talk is brand-poison); (5) Atlassian/Dia 60-min partnership discovery deferred past Q2 (one-way door); (6) no paid push around "Toby vs Chrome 133" until conversion economics fix (the free comparison page is fine) (see: toby/strategy/playbook.md anti-bets)
  • X engagement protocol locked (from toby/x/engagement-targets.md): first touch is always in-character + generous; no link in the first reply (drop gettoby.com only when prompted); max 1 reply per target per day; no DM cold-pitches; no mass engagement (>8/day reads as a bot); never reply as the brand to vulnerable-distress posts; cordial silence on competitor accounts (Uncluttr, TabVault Pro, ThoughtFold, tab-out, leap-tabs.com). New 2026-05-10 — bot filter applied: every target satisfies followers_count ≥ 100 AND account_created < 2025-06 — keeps targets credibly human and avoids burning brand reputation on bot-shaped replies (see: toby/x/engagement-targets.md hard rules)
  • Top growth lever this quarter is the dormant curator loop, not acquisition channels — 1,848 Free-Tier Archivists with 18.6% public-share rate are Toby's only native viral surface and have never been explicitly activated; the loop costs zero engineering. Now jointly served by X + blog: blog pipeline queues a recurring "Public collection of the week" curator series (rank 6) that pairs with the X curator-spotlight slot (see: toby/strategy/playbook.md, toby/strategy/bets.md#public-collection-pride-loop, toby/blog/pipeline.md queue rank 6)
  • Phase 1 / Phase 2 split: ship 1.13.0 with cleanup + Session Start before Jad's OOO; rebuild experiment machinery from scratch in Phase 2 (clean slate is intentional) (see: tasks/onboarding-experiment-plan.md)
  • Welcome A/B isolated variable = "presence of a dedicated welcome / Get Started screen". Success metric = D7 retention with +5pp lift over the V2-only baseline (32.92%); kill criterion if neither variant reaches 34% D7 at n≥2,000/arm by 2026-05-26 (see: tasks/onboarding-experiment-plan.md, toby/strategy/playbook.md O2 KR2)
  • Auth becomes a modal overlay on a blurred dashboard (shared-spine improvement, not an experiment arm) (see: tasks/onboarding-experiment-plan.md)
  • Draft is the source of truth for variant / isFallback; once the 2s timeout fires the draft is bucketed and frozen — late API responses write only to a debug-only experimentApiLate field (see: tasks/onboarding-experiment-plan.md)
  • All cross-auth analytics joins use Amplitude device_id, not _entityId (the SDK re-identifies on login) (see: tasks/onboarding-experiment-plan.md)
  • Toby has no guest mode — "Skip" means skip the tutorial, not skip auth; skip path pre-seeds starter demo content into the draft (see: tasks/onboarding-experiment-plan.md)
  • North Star = Weekly Card Opens; supporting = Weekly Card Saves; supporting events Open card, Open all cards, Close all, Open these, Add tab (see: worklog.md)
  • New endpoints use the v3 controller pattern (BaseController + explicit DI); v2 (gocraft/web + context structs) stays only for legacy (see: CLAUDE.md)
  • 90-day investment ratio (post-v3): 70% reliability, 20% role-based gating, 10% AI feature relaunch (see: research-docs/toby-delta-2026-05-05-v3.md)
  • AI feature relaunch deferred Q3 → Q4 2026 (see: research-docs/toby-delta-2026-05-05-v3.md)
  • "270K untracked WAU" mirage resolved: real active user base is ~62–75K (CWS WAU 380k inflated 5–6×); ~54K free, ~5,800 paid (10.7% active-user conversion) (see: product/strategy/next-actions.md)
  • AI requires JWT + team_id + "ai" feature flag, so AI cannot run on the unauthenticated Save Tabs slide in the "end" variant — only "beginning" supported AI there (see: docs/ai-onboarding-ideas-analysis.md)
  • Persistence-as-activation is a 10× axiom (compass axiom 3): "I opened a new tab and my organized world was still there", not "AI organized my tabs" (see: docs/ai-onboarding-ideas-analysis.md, toby/strategy/compass.md axiom 3)
  • Calm axis sharpened to "calm-by-ambient-surfacing" (compass axiom 2 — value in the gap between intent and next action). Plain "calm" is now a category claim (Sunsama, Neurosity, Cold Turkey); ambient new-tab surfacing is what stays defensible (see: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/compass.md axiom 2)
  • Cloud-account differentiation relegated to table stakes (compass anchor #2) — Chrome 133 ships saved-tab-group cross-device sync natively (see: toby/strategy/compass.md anchor 2, 2026-05-10)
  • Reliability promoted to anchor #1 (compass) — the blank-page bug puts the entire product at risk, not just CWS rank (see: toby/strategy/compass.md anchor 1, 2026-05-10)

Open questions / blockers

  • Sandcastle agentic-workflow — secret scanner has no test coverage (NEW 2026-05-13). Per toby/code-reviews/2026-05-13-10commits.md, .sandcastle/scan-secrets.mts:1-182 (commit 75a09e3) is the only thing standing between an unattended LLM agent and a pushed credential. It's a regex-on-added-lines walker with a skip-list — single-pattern, single-line, no inline-base64, no multi-line obfuscation detection. The author's own comment acknowledges it isn't designed against an adversarial agent, but the regression risk is silent. Operator action: prioritise the auto-filed ticket (.sandcastle/scan-secrets.mts needs unit-test coverage); ideally close before the first sandcastle-authored slice opens a real PR. (see: toby/code-reviews/2026-05-13-10commits.md scan-secrets finding)
  • Sandcastle agentic-workflow — gate-skip env vars have no audit trail (NEW 2026-05-13). SANDCASTLE_SKIP_GATES_VERIFY=1 and SANDCASTLE_SKIP_SECRET_SCAN=1 fully disable verification and secret-scan with only a console.warn. As the workflow scales (CI runner, shared image, teammate rc-file), no audit ties a pushed branch back to which gate was waived. Operator action: triage the auto-filed ticket (Sandcastle skip-flags audit logging); cheap to add (PR body field + .sandcastle/audit.log). (see: toby/code-reviews/2026-05-13-10commits.md sandcastle skip-flag finding)
  • Sandcastle EXPECTED_GATES lock-step risk (NEW 2026-05-13). EXPECTED_GATES = ['lint', 'build'] intentionally omits typecheck and test because the extension repo has pre-existing TS debt and no vitest scaffold on main. The moment the TS-clean branch lands, EXPECTED_GATES must be updated in lockstep — easy to forget; would silently let bad slices through. Doc-only finding from the code-reviewer; would warrant a tracking issue once the TS-clean branch is in flight. (see: toby/code-reviews/2026-05-13-10commits.md EXPECTED_GATES finding)
  • useSessionStart.ts shipped without unit tests (NEW 2026-05-13). Phase 1 production analytics hook (the 4h Session Start heartbeat at commit 0f3aa38d) has 75 lines of quiet logic — rate limit, days_since_signup parse, storage-write-ordering — and no tests. Code-reviewer auto-filed a ticket. Quiet-failure risk: a stale userId shape or broken rate-limit math wouldn't surface until someone runs a BigQuery backfill. Doesn't block Phase 2 but is ship debt on already-shipped code. (see: toby/code-reviews/2026-05-13-10commits.md useSessionStart finding)
  • Retention-funnel structural leak (2026-05-12). Per toby/incidents/2026-05-12-retention-offers-silent.md, the in-app retention modal is bypassed by ~82% of cancels (120 cancels → 22 reasons → 1 accept over 30 days). Three FE-orchestration paths route around the modal: (a) the Stripe Customer Portal View link is preloaded inside the in-app Subscription panel (apps/extension/app/components/Modal/OrgSettings/Subscription.tsx:51-62, 192-208); (b) Stripe renewal/receipt emails contain "Manage subscription" links to the same portal; (c) team_legacy + team_basic users have no in-app cancel CTA at all because hasSubscription excludes them (Subscription.tsx:41-43) — and that's the cohort with the worst churn pressure (Feb-26 ThankYouLegacy renewals). Operator owns the Tier 2 product calls — hide the Stripe-portal link OR configure flow_data redirect; widen hasSubscription; investigate the retention_yearly ~0%-accept branch. Bigger monetization lever than the Tier 1 instrumentation patch. (see: toby/incidents/2026-05-12-retention-offers-silent.md fix tier 2, toby/strategy/playbook.md O3)
  • Tier 1 retention instrumentation — awaiting Go-reviewer sign-off (2026-05-12). Corrected compile-ready 10-LOC zap log patch is in the canonical incident doc. Validator already vetted shape + symbols; one Toby Go reviewer needs to approve. After merge + deploy, the verify plan computes the first-ever FE funnel ratio in 14 days — settles whether the lever is in copy/offer (decline-not-show) or in routing (show-not-call). (see: toby/incidents/2026-05-12-retention-offers-silent.md fix tier 1 + verify plan)
  • Five missing TOBY_RETENTION* GCP secrets — housekeeping (2026-05-12). TOBY_RETENTIONMINSUBSCRIPTIONDAYS, TOBY_RETENTIONCOOLDOWNMONTHS, TOBY_RETENTIONLEGACYYEARLYPRICE, TOBY_RETENTIONCOUPONLEGACY, TOBY_RETENTIONCOUPONYEARLY. System operates correctly on struct-tag defaults — the only side effect is 5 "failed to access secret version" log lines per cold start. File as separate ticket; either create with current defaults or remove the lookup entirely. (see: toby/incidents/2026-05-12-retention-offers-silent.md fix tier 4)
  • Blog publish flow. The agent writes drafts into the wiki at toby/blog/; the production blog lives at apps/landing/src/content/post/. Three sub-questions stacked: (1) hand-copy approved drafts into the codebase repo or change the publish flow? (2) image hand-off — neither current draft has a cover; existing posts use ~/assets/images/blog/<post-folder>/<image>.png; the OneTab draft specifically needs a "URL list vs visual collection" hero (the image carries the post's core claim); (3) internal-link URL shape on gettoby.com may not match filenames — confirm before any inter-post link ships. Blocks: first publish of either of the two drafts at toby/blog/. Related operator calls: pair-or-not with X Post 1 ("47 tabs is not a personality flaw") on Tue 2026-05-12, and queue a new X anchor for the OneTab draft (the Post 1 pair is already claimed) (see: toby/blog/pipeline.md open questions, toby/blog/onetab-alternative.md editor notes).
  • OneTab Alternative draft — reliability-gate decision. toby-blog-seo recommends holding publish of the OneTab draft until the 3-layer reliability hotfix ships (O1 KR1, due 2026-05-24) so we don't drive switchers into the live blank-page bug. The foundational tab-hoarder draft is not similarly gated. Operator decides; treating as recommendation, not a hard rule. Resolves naturally when the hotfix ships (see: toby/blog/pipeline.md operator-decisions, toby/blog/onetab-alternative.md editor notes).
  • Reliability hotfix — review, merge, deploy, monitor (diagnosis closed 2026-05-11; PR opened 2026-05-12). PR #12 carries the 3-layer fix; commit 06baf0f8a; branch warroom/2026-05-11-blank-extension-page-toby-14. The fix-shipper validator vetted race-safety + d68726b29-regression-safety + copy; the 2026-05-13 code-reviewer pass independently confirms d68726b29 is itself correct (the gate widening was right; what was missing was bounding the new dependency, which PR #12 fixes); CI is the canonical typecheck/lint gate (no local check ran in the ephemeral worktree). Operator steps now: (1) PR review + merge; (2) bump extension version (Phase-1 was 1.13.0; this is likely 1.14.0 or .x.x patch — operator decides); (3) push new build through CWS; (4) watch NewTabHangShown in Amplitude for 7-day baseline + CWS reviews for new "blank screen" 1-stars over the 14-day O1 KR1 window (target: zero). The three non-blocker follow-ups from the incident doc (Layer-1 for isInitializing, SW-hardening trio, Layer-1 NewTabHydrationTimeout beacon) remain queued as separate work. Side-effect: unblocks blog pipeline's P3 power-user posts AND unlocks publish of the OneTab Alternative draft once merged + deployed (see: toby/incidents/2026-05-11-blank-extension-page.md PR shipped + Follow-ups, toby/strategy/compass.md anchor 1, toby/strategy/playbook.md O1, toby/blog/pipeline.md P3 gate + OneTab gate, toby/code-reviews/2026-05-13-10commits.md d68726b finding).
  • Reliability follow-up PRs — not yet filed (2026-05-12; surfaced by PR #12's deliberate scoping). The 2026-05-11 incident doc's "Follow-ups (NOT blockers for closing this incident)" section lists three distinct work items that were deliberately not included in PR #12: (1) apply Layer-1 5s timeout shape to isInitializing (useIsRestoring() IDB-backed path in Toby.tsx:168-275) so the page can self-heal pre-8s without a tap; (2) SW hardening trio — .catch() on persistQueryClientRestore at background.ts:14, AbortController+10s on contextMenus.ts:145-191 fetch, unified chromeStorageGet<T>(keys, { timeoutMs }) helper to replace every raw chrome.storage.local.get callsite; (3) Layer-1 NewTabHydrationTimeout telemetry beacon so the common 5s recovery path is visible in Amplitude (without it, only the 8s tail surfaces). Operator should decide whether to file these as separate bug/improvement tickets now (with stub bodies pointing to the incident doc) so the work doesn't get lost after PR #12 merges (see: toby/incidents/2026-05-11-blank-extension-page.md Follow-ups section).
  • Phase 2 silent slip risk. Plan targets "week of Apr 20"; today is 2026-05-13 and there are zero Phase 2 commits. Playbook red-team residual: if 2026-05-19 arrives with no commits, the middle of the playbook disintegrates and O2 is dead. Confirm whether work is on an unpushed branch or Jad's ETA is firm (see: tasks/phase2-todo.md, toby/strategy/playbook.md red-team).
  • Pricing contradiction — now three different numbers (strengthened 2026-05-12). Internal modeling uses $4.50/mo; Efficient.app lists Toby at $6/$10; TheTab's blog comparison post lists Toby at $9/mo (surfaced via toby-blog-seo competitor-blog watch this run). Either pricing has been changing without internal docs being updated, public listings are stale, or there's a tier we're not tracking. Blocks role-based-paywall-gating; playbook O3 KR1 sets hard deadline 2026-05-13. Also keeps the blog pipeline's price-claim guardrail in force (no blog post may mention price until reconciled — both drafts comply) (see: research-docs/toby-delta-2026-05-05-v3.md, toby/strategy/playbook.md O3, toby/blog/pipeline.md operator-decisions + competitor-blog watch).
  • @TobyForTabs account-creds gate. X channel cannot ship until operator confirms account credentials are in-hand. Step 1 of the engagement-targets operator one-pager. If this isn't resolved by Mon 2026-05-11, the Tue 2026-05-12 scheduled first post slips. (Post 13 pin is also acct-gated.) (see: toby/x/engagement-targets.md operator one-pager).
  • Competitor watch — engagement-targets list refreshed 2026-05-10. Five accounts now surfaced in the live engagement-targets doc as do-not-engage competitors or category-adjacent: Uncluttr (@ciprian__b — AI-organize tab manager, CWS #18 "vertical tab manager", just earned Blue Checkmark, building in public on the AI lane Toby has deferred to Q4 2026), TabVault Pro (@godesign_art — free OneTab/Session-Buddy alternative, launched 2026-04-19), ThoughtFold (via @PersonAI_talks — "zero-cloud Chrome tab manager for neurodivergent brains", PH launch 2026-04-25 — sharper threat for overlapping AuDHD ICP shape; engagement-targets + content-pipeline + blog pipeline all advise keeping our framing implicit), tab-out (@Yieioo — open-source visual tab manager, added "Page Snapshot" feature 2026-05-03), leap-tabs (@RomanGweb3 — sidebar tab manager + spaces). Policy is cordial silence (X) + comparison content fine but no trash-talk (blog — OneTab draft is the first proof-of-life). TabRack and LocalArchive (carried in last cycle) are no longer in the engagement-targets list — confirm with strategist whether they fell off the signal-source query or were deliberately de-prioritized (see: toby/x/engagement-targets.md Tier C, toby/x/content-pipeline.md drafts-not-proposed, toby/blog/pipeline.md competitor blogs + NOT-pursued).
  • Team-buyer X-pillar drop decision deadline 2026-05-17. The latest engagement-targets bucket recap is still zero team-buyer candidates after 26 targets. Doc forces an operator-only call: either supply the missing target shape (e.g., LinkedIn-DM canvass of 25 multi-team admins to surface ≥3 "I want a paid Team plan, but [reason]" responses) or drop the team-buyer pillar from X entirely. 4,908 active multi-team users / 79 paid yearly Team subs is the imbalance being tested (see: toby/x/engagement-targets.md bucket recap, toby/strategy/playbook.md open questions, toby/x/strategy.md).
  • Does the "ambient new-tab surface" axiom survive Google deprecating new-tab override permissions? (from playbook red-team) Strategic 2027-2028 risk. We've taken no defensive action. Resolves when: Google publishes a deprecation roadmap OR a credible signal lands from a Chrome team contact (see: toby/strategy/playbook.md open questions, toby/strategy/compass.md axiom 2).
  • What's the true full-price new-user conversion rate? (from playbook) Action 1 in product/strategy/next-actions.md is still unresolved. Determines whether the flywheel is structurally viable at $54/yr (see: product/strategy/next-actions.md, toby/strategy/playbook.md open questions).
  • Will the reliability hotfix close the not_using cancellation reason (39% of churn)? (from playbook) Hypothesis: blank-page failures drive part of not_using (users churn because product was broken, not because they didn't value it). Resolves when: 60 days post-hotfix the churn-survey not_using share is re-pulled and compared to the 39% baseline. Now genuinely testable — PR #12 is open as of 2026-05-12; the 60-day measurement window starts the day the merged build hits CWS, NOT the day diagnosis closed (see: toby/strategy/playbook.md open questions, toby/01-personas.md, toby/incidents/2026-05-11-blank-extension-page.md).
  • worklog.md last entry is 2026-02-02 — stale by ~3 months. Either it needs an update or another rolling log has taken its place. Worklog stale-flag on CancelSubscription.tsx retention-discount wiring is now resolved as of the 2026-05-12 incident (integration confirmed shipped via live apps/extension/app/components/Modal/Downgrade/CancelSubscription.tsx:643-709); the worklog itself should be updated to retire that pending item.
  • Uncommitted state on main: modified CLAUDE.md; untracked docs/ai-onboarding-ideas-analysis.md, product/ideas/ (mcp-integration.md), research-docs/ (six toby-research/delta files). Should these be committed before Phase 2 begins?
  • Action 0 follow-up: WAU number was corrected from 380K → ~62–75K; verify all dashboards, marketing copy, and investor-facing materials reflect the corrected number (see: product/strategy/next-actions.md).
  • Free-tier collaboration limits vs. anchor #4 (compass): the role-based-gating bet may restrict collaboration on the free tier, but anchor #4 says "the free experience must be genuinely usable, not crippled". Where is the line? Needs explicit decision before the gating experiment ships — playbook O3 KR2 forces the call by 2026-06-15 (see: toby/strategy/compass.md anchor 4, toby/strategy/playbook.md O3 KR2).

Doc index

  • Toby — State of the Project — this dashboard (operational view: status, OKRs, next steps, decisions, blockers).
  • Toby — Product Compass — identity / axioms / anchors / brand promise / anti-promises. Authoritative on who Toby is (2026-05-10).
  • Toby — Q2 2026 Growth Playbook — growth thesis, current-loop diagnosis, top-5 bets, OKRs, anti-bets, red-team pass (2026-05-10).
  • Toby — Rolling Bets Queue — ICE-scored in-flight / proposed / validated / killed with falsifying signals (2026-05-10).
  • Toby on X — Strategy — operator-driven channel surface, ICP buckets, pillar framing (2026-05-10).
  • Toby on X — Content Pipeline — 13 drafted posts, scheduling, pin candidate (Post 13) (2026-05-10).
  • Toby on X — Engagement Targets26 ranked candidates (6 Tier-A / 11 Tier-B / 9 Tier-C), reply drafts, bot-filter rule, hard engagement rules, competitor watch (2026-05-10).
  • Toby — Blog & SEO Pipeline — Run #2 (2026-05-12): 2-week cadence, 5 pillars mirroring X, 7 ranked queued topics, voice fingerprint, competitor-blog watch list, guardrail open questions (2026-05-12).
  • Why You Have 80 Tabs Open (And Why That's Actually Fine) — first draft from the blog pipeline; P1 top-funnel; paired with X Post 1 (2026-05-09).
  • OneTab Alternative: Save Tabs as Workspaces, Not URL Lists — second draft from the blog pipeline; P5 mid-funnel; "URL list → workspace" reframe; rel-gate recommended on publish (2026-05-12).
  • Toby Incidents — How this works — warroom onboarding doc; five-agent workflow Toby Incident Response (id 9b78790f-2aea-4f65-876f-53d1a114c3ae); daily 09:00 UTC cron; entry path is a ticket with needs-warroom label, bridged into _inbox/ automatically (2026-05-11; bridge + fix-shipper added 2026-05-12).
  • Incident: 2026-05-11 — Blank extension page (infinite-load hang on new tab) — first canonical incident; status closed → shipped (diagnosis 2026-05-11, PR #12 opened 2026-05-12 via Wave 4 fix-shipper); verdict validated (high confidence); root cause + 3-layer frontend-only fix landed on branch warroom/2026-05-11-blank-extension-page-toby-14 (commit 06baf0f8a); 3 follow-up surfaces deliberately deferred (2026-05-11 · 2026-05-12 ship update).
  • Incident: 2026-05-12 — retention_offers silent (TOBY-6) — second canonical incident; status closed (diagnosis); verdict validated + medium; Wave 4 correctly skipped auto-ship; corrected Tier 1 instrumentation patch in the doc awaiting Go-reviewer sign-off; Tier 2-4 follow-ups to be filed as separate tickets. Surfaces structural retention-funnel leak (~82% bypass) and resolves the long-standing CancelSubscription.tsx integration question (2026-05-12).
  • Toby code review · 2026-05-13NEW 2026-05-13. First run from the new toby-code-reviewer agent; back-fill of the 10 most recent commits on main (window ends at 75a09e34d); 3 medium + 4 low findings; 3 tickets auto-filed (useSessionStart no-tests, .sandcastle/scan-secrets.mts no-tests, sandcastle skip-flags audit-trail missing). Reviewer never patches — files tickets only (2026-05-13).
  • Toby — Personas — Solo Pro Power Saver + Tenured Free Organizer + the other 5 segments (100% of active users).

Other strategic context for this dashboard is sourced from the codebase: tasks/onboarding-experiment-plan.md, tasks/phase1-todo.md, tasks/phase2-todo.md, research-docs/toby-research-2026-05-05-v3.md, research-docs/toby-delta-2026-05-05-v3.md, product/strategy/next-actions.md, product/strategy/soul.md, product/learnings.md, product/ideas/mcp-integration.md, docs/ai-onboarding-ideas-analysis.md, worklog.md, CLAUDE.md, and the brand-new top-level .sandcastle/ subsystem (orchestrator + secret scanner; commit 75a09e34d).