Toby — Rolling Bets Queue
_Append + edit. Killed bets stay with autopsies so future runs don't reinvent them. Score is ICE: Impact (1-10) × Confidence (1-10) × Ease (1-10), where Ease = inverse of effort. MoSCoW: Must / Should / Could / Won't. Every bet declares a falsifying signal in plain English._
In-flight
reliability-blank-page-fix — Hotfix the new-tab blank-page incident
- Status: in-flight
- Hypothesis: A guarded error-state UI ("your tabs are safe; tap to recover") + a synthetic check that asserts the new-tab page renders collections within 2s will (a) cut current 1-star CWS-review velocity, (b) reverse the CWS rank penalty that's compounding because 2026 algorithm weights WAU + review recency + Core Web Vitals.
- Why now: live blank-page incident is cross-confirmed in CWS reviews, Vivaldi 7.0 forum, Orion issue tracker. v3 research flagged it as the highest-leverage fix available (see: research-docs/toby-delta-2026-05-05-v3.md, 2026-05-05). Upstream of every other strategic move — every hour this lingers, the 2026 CWS algorithm penalizes Toby.
- Success metric: zero new 1-star reviews citing "blank screen" within 14 days of hotfix release; CWS review average stops declining; synthetic check passes 99.5%+.
- Falsifying signal: if we ship a fix and 1-star "blank screen" reviews continue within 14 days, the bug isn't fixed — re-open with broader reproduction (browser × extension combinations).
- Owner: TBD — operator decision needed.
- ICE: I=9, C=8, E=8, score=576
- MoSCoW: Must
- Started: 2026-05-05 (per v3 research)
- Last review: 2026-05-10
phase-2-welcome-ab — Onboarding welcome-screen A/B test
- Status: in-flight (planned, no commits yet — silent slip suspected)
- Hypothesis: The presence of a dedicated welcome / Get Started screen lifts D7 retention by ≥5pp over the V2-only baseline (32.92%). Activation moment is persistence, not "AI organized my tabs."
- Why now: cohort retention W4=39.2%, W12=30.5% per
toby/01-personas.md; Day-1 retention 46% perproduct/learnings.md. The first 4 weeks are where the funnel leaks — every other persona has 83-94% weekly stickiness; New Adopters convert at 58.4%. Phase 2 is fully specced (12 slices, halt triggers, canary stages) intasks/phase2-todo.md. - Success metric: ≥34% D7 retention on at least one variant at n≥2,000/arm by 2026-05-26 decision review.
- Falsifying signal: kill if neither variant hits 34% D7 at n≥2,000/arm by 2026-05-26 (the team's own pre-defined kill criterion — preserve it).
- Owner: Jad (per
toby/00-state-of-the-project.md). - ICE: I=8, C=6, E=4, score=192
- MoSCoW: Must
- Started: planned week of 2026-04-20, zero commits visible as of 2026-05-10. Open question: is work on an unpushed branch or genuinely deferred?
- Last review: 2026-05-10
cws-narrative-repair — Listing rewrite + cloud-sync + social proof + CWV
- Status: in-flight (planned, no commits — listing copy still hasn't been refreshed)
- Hypothesis: Retitling to
Toby — Tab Manager: Save Sessions, Cloud Sync & Notes, rewriting the description with explicit cloud-sync mention, surfacing "used daily by teams at Netflix/Amazon/HubSpot/IBM" social proof, and publishing a Core Web Vitals benchmark will lift CWS install-conversion on the residual high-intent traffic (currently 30% conversion on ~250 daily installs). - Why now: post-Oct-8 CWS page-view collapse is permanent (5,170 → 897/day, not recovered). The remaining traffic is high-intent, so even small conversion lifts compound. Two-way door — easily reversible.
- Success metric: +20% lift in CWS install-conversion on the 4-week window post-rewrite.
- Falsifying signal: if 4 weeks post-rewrite there's no measurable lift, the problem is not narrative — it's ranking. Pivot to acquisition-channel diversification.
- Owner: TBD — operator decision needed.
- ICE: I=7, C=7, E=8, score=392
- MoSCoW: Must
- Started: pending
- Last review: 2026-05-10
pricing-reality-reconcile — Close the $4.50 vs $6/$10 contradiction
- Status: in-flight (1-hour audit, not yet done)
- Hypothesis: Either a price increase shipped publicly without internal docs being updated, or the Efficient.app listing is stale. Until we know, every pricing experiment is built on a guess.
- Why now: blocks
role-based-paywall-gating. v3 research flagged it as a 1-hour single cross-check that resolves a major business-model assumption (see: research-docs/toby-delta-2026-05-05-v3.md). - Success metric: one short doc (
product/metrics/pricing-reconciliation.md) listing current Stripe price IDs in production, CWS listing, gettoby.com landing, and a single authoritative number. - Falsifying signal: if the audit reveals a 3rd price point we didn't know about, scope expands — the contradiction is bigger than v3 captured.
- Owner: TBD — operator decision needed.
- ICE: I=6, C=10, E=10, score=600
- MoSCoW: Must
- Started: 2026-05-10
- Last review: 2026-05-10
Proposed (queued, awaiting decision)
cliff-renewal-offer-ladder — Pre-stage the Feb-2027 mega-cliff with an offer ladder + funnel instrumentation
- Status: proposed
- Hypothesis: Feb 2027 sees 2,354 legacy ThankYou subs / 2,700 seats / $12,447.50 MRR renew in a single month (31% of total MRR, 26% of paid subs); the broader Oct-26 / Jan-27 / Feb-27 cliff window is $22.6K MRR / 55% of total reaching a renewal decision. The cohort is mostly at legacy $4.50/seat/yr pricing. Without a pre-staged offer ladder + instrumented funnel + A/B'd email cadence, the cliff resolves either as silent churn (no offer) or undifferentiated re-price (all-or-nothing).
- Why now: ~8.5 months from the first cliff month and the cohort is already leaking (17 subs / $81 MRR pre-cancelled this week). Build target Q3 2026 (Jul-Sep) so it's in production by the Oct-26 leading edge. Blocked on
pricing-reality-reconcile(O3 KR1, due 2026-05-13) — the offer ladder has no anchor price to ladder from until the three inconsistent prices ($4.50 internal / $6/$10 Efficient.app / $9 TheTab) reconcile to one. - Success metric: at the Feb-2027 cliff month, ≥60% of the renewal-decision cohort engages with the offer (open or click); ≥40% of engaged subs renew (vs. modeled cold-renewal baseline ~25%); ≥$7.5K MRR retained of the $12.4K Feb-27 wave.
- Falsifying signal: if 60 days post-launch engagement is <30% OR the instrumented funnel shows >70% drop-off between offer-view and renewal-decision, the offer-ladder framing is wrong. Switch to a single calm one-touch email + landing page; abandon the ladder.
- Owner: TBD. Blocked on
pricing-reality-reconcile. - ICE: I=9, C=6, E=4, score=216
- MoSCoW: Should
- Last review: 2026-05-12 (promoted from TOBY-7 during backlog triage; ticket cancelled as the work is bet-shaped, not ticket-shaped)
role-based-paywall-gating — Restructure paywall around team/admin/sharing, not card limits
- Status: proposed
- Hypothesis: Gating team / admin / sharing features behind paid (instead of card limits) hits the under-monetized Multi-User Collaborator persona (4,908 active multi-team users; only 79 yearly Team subs cover them — ~2.6% of multi-team active users pay) and the Free-Tier Archivist persona (1,848 users averaging 224 lists, 28.4% labels, 18.6% public-share rate — the heaviest organizers, paying $0). v3 research benchmarks this at 2× conversion.
- Why now: v3 research identified this as the highest-leverage monetization restructure. Unblocks once
pricing-reality-reconcilelands. - Success metric: free→paid conversion among multi-user-collaborator + free-tier-archivist segments doubles within 90 days post-launch (baseline: ~2.6% and ~0% respectively).
- Falsifying signal: if conversion among either target segment moves <30% in 60 days post-launch, the gating axis is wrong. Roll back; the feature gates aren't the bottleneck.
- Owner: TBD.
- ICE: I=8, C=6, E=3, score=144
- MoSCoW: Should
- Last review: 2026-05-10
public-collection-pride-loop — Surface, reward, and amplify public-list creators
- Status: proposed
- Hypothesis: Free-Tier Archivists (1,848 users, 18.6% public-share rate, avg 224 lists) are Toby's only native viral surface — and we don't ask them to share, surface their work, or close the recognition loop. A "public collection of the week" series on X (already in
toby/x-content-pipeline.mdPost 5) + a curator-spotlight slot ongettoby.com+ a one-tap "feature my collection" submission flow would activate the only content-loop Toby owns. This is the under-pulled growth lever. - Why now: only 3% of users have ever made a public list; 14,306 active card-share links exist; the share-link surface is largely dormant from a growth-team perspective. Zero engineering required for the X/blog version — operator-driven, low cost.
- Success metric: 10 curated public collections featured by 2026-06-30; +25% week-over-week growth in new public-list creation; ≥3 inbound CWS installs trackable via UTM from the curator-spotlight URL within 6 weeks.
- Falsifying signal: if 10 weeks of "public collection of the week" generates <50 trackable installs total (across X + blog combined), the loop isn't viral — kill and reallocate to paid-channel diagnosis.
- Owner: TBD (likely toby-x-strategist + toby-blog-seo coordination).
- ICE: I=6, C=7, E=8, score=336
- MoSCoW: Should
- Last review: 2026-05-10
lapsed-pro-reactivation-campaign — Targeted reactivation flow for 101k cancelled monthly Pros
- Status: proposed
- Hypothesis: 101,229 cancelled monthly Pro subs is 14× the active Pro base. Cancellation reasons are 39%
not_using, 37%too_expensive, 22% other, 2%missing_features(pertoby/01-personas.md). For thenot_usingcohort specifically, a "your library is still here — open Toby and see what you've forgotten" reactivation email (no discount, just recognition + persistence promise) plus a 7-day return-to-Pro free trial should pull a measurable percentage back. - Why now: even 5% reactivation moves $7-15K MRR — material vs. the $14.8K gap to breakeven (see: product/learnings.md). Costs are email + a small flow; doesn't compete with engineering bandwidth on reliability or Phase 2.
- Success metric: ≥3% trial-start rate on the campaign; ≥40% trial-to-paid conversion (vs. global benchmark); net +$5K MRR within 60 days of full rollout.
- Falsifying signal: if 60 days post-launch trial-start rate is <1% OR trial-to-paid <20%, the lapsed cohort has structurally moved on. Don't retry without a meaningfully-different angle.
- Owner: TBD.
- ICE: I=7, C=5, E=6, score=210
- MoSCoW: Should
- Last review: 2026-05-10
chrome-133-vs-toby-comparison-page — AI-search-friendly comparison page
- Status: proposed
- Hypothesis: Publishing a short comparison page explaining what Chrome 133 native sync covers vs. what Toby still does (visual collections, sharing, notes per tab, multi-team, deep history) earns recommendation slots in Perplexity / ChatGPT / Claude / Atlas answers when users ask "Chrome 133 vs tab manager" — and also captures organic search.
- Why now: Chrome 133 saved-tab-group cross-device sync is live; the question is being asked. v3 research flagged this as a 1-week action. We are not in this conversation today.
- Success metric: page indexed within 30 days; at least 1 confirmed AI-search citation (Perplexity / ChatGPT) within 60 days; +100 trackable installs from the page within 90 days.
- Falsifying signal: if 90 days post-publish there's no AI-search citation and <20 trackable installs, the AI-recommendation channel is closed for our shape of product. Don't try a second time without a fundamentally different angle.
- Owner: toby-blog-seo + landing-page eng.
- ICE: I=5, C=6, E=8, score=240
- MoSCoW: Should
- Last review: 2026-05-10
x-relaunch-soft — Restart @TobyForTabs at 3 posts/week with Tier A engagement
- Status: proposed
- Hypothesis: @TobyForTabs is dormant (2 posts in 14 months, 2,913 followers) while demand on X is loud and unattached (dozens of organic "too many tabs" complaints per week, per
toby/x-strategy.md). A calm, in-character relaunch — 3 posts/week + Tier A engagement on 5-10 ICP-shape accounts/day pertoby/x-engagement-targets.md— earns brand presence at low cost and feeds CWS installs via UTM attribution. - Why now: account credentials need verification (🔒 acct gate). Once verified, the relaunch is operator-driven, costs zero engineering. Best-window timing already mapped.
- Success metric: median impressions/original-post ≥ 5,000 within 6 weeks (baseline ~2,500); ≥1 ICP-shape reply per original post by week 6; ≥15% follow rate on first-touch replies.
- Falsifying signal: if 8 weeks of disciplined cadence yields median <2,000 impressions/post and <3 ICP replies per month, the channel isn't worth the operator's time — wind down or sell to a contractor.
- Owner: TBD (operator-led; toby-x-strategist drafts).
- ICE: I=4, C=6, E=7, score=168
- MoSCoW: Could
- Last review: 2026-05-10
seo-content-cadence-2-weeks — Ship one calm essay every 2 weeks
- Status: proposed
- Hypothesis: A 2-weekly blog cadence (P1/P5 mix per
toby/blog-pipeline.md) compounds an organic acquisition channel that doesn't depend on CWS rank. SERPs for "too many tabs", "OneTab alternative", "bookmarks vs tab manager" are dominated by listicles and weak essays; Toby's calm voice + brand authority can win foundational queries. - Why now: first draft (
blog-why-you-have-so-many-tabs-open.md) shipped to wiki, not yet published. Pipeline is sequenced. Operator owns publish step + image hand-off. - Success metric: 6 posts published by end of Q3 2026; one post ranks top-10 on its primary keyword within 90 days; +200 trackable installs from blog-attributed traffic within 6 months.
- Falsifying signal: if after 6 published posts none rank top-10 and total blog-attributed installs <50, SEO is not the channel — the SERPs are friendlier than they look or our voice isn't winning. Hold the cadence at 1/month and divert effort to public-collection-pride-loop.
- Owner: toby-blog-seo drafts; operator owns publish.
- ICE: I=6, C=5, E=6, score=180
- MoSCoW: Should
- Last review: 2026-05-10
Validated (succeeded — graduated to ongoing motion)
session-start-heartbeat — 4h Session Start event for intra-day retention analysis
Shipped in v1.13.0 on 2026-04-14 (commit 0f3aa38d). Halt threshold 180k/day with 60k/day projection. Now the foundation for Phase 2's D7 retention measurement and the persona-shape analysis in toby/01-personas.md. Durable lesson: instrument the metric before running the experiment that depends on it. Date validated: 2026-04-14.
retention-discount-all-cancellation-reasons — Retention discount eligible for any cancel reason
Backend live (commit cbc92a78d). Removed the "valid reason" filter so the discount surfaces for every churning user. Durable lesson: a small backend rule change can unlock significant top-of-flow exposure to retention offers; UI/UX investment can come later. Note: the frontend wiring in CancelSubscription.tsx is flagged as pending in worklog.md (Jan 2026) but no commit confirms it shipped — operator confirm. Date validated: 2026-01-30.
monorepo-flatten-turborepo — Monorepo flatten + Turborepo / pnpm workspaces migration
Shipped late March 2026 (commits 87bec6267, a90230ce1, 134f9bb90, ec843c5a2, c5545cbd5, 2574b5379, 5bd961266). apps/{api,extension,landing,mobile} + shared packages/. Durable lesson: collapsing 4 submodules into a monorepo with a single dep graph cut release-coordination cost; the migration was a one-time cost paid before the Phase 2 onboarding work that depends on shared analytics packages. Date validated: 2026-03-31.
cws-review-monitor-with-ai-drafts — Cloud Run job posting AI-drafted CWS review responses
Shipped 2026-03-30 (commit ba247d9a) with fallback Slack message added 2026-04-29 (commit b9bea18c). Durable lesson: low-engineering-effort automation closed a brand-hygiene gap that was previously not staffed. Date validated: 2026-03-30.
Killed (preserved so we don't reinvent)
deeper-discount-retention-offer-50pct — 50% off save offer to ThankYouLegacy churners
Killed: failed at <2% save rate (only 14 of 740 ThankYouLegacy churners took it). Why it failed: the 50% offer ($2.25/mo) is still 2× what ThankYouLegacy users were paying ($0.99/mo). Price wasn't the lever — the cohort was attracted by the 78% legacy discount specifically; they were never going to pay $3+/mo. What would have to be different to revive: only if we found a structurally cheaper-to-serve plan ($1-2/mo with explicit feature limits) for explicit price-sensitive cohort retention. Date killed: 2026-02-15 (per product/learnings.md).
ai-feature-pre-launch-q3-2026 — Pre-announce AI organize / smart-collection naming in Q3 2026
Killed: deferred to Q4 2026 per v3 research delta — AI-browser threat is slower-burn (18-24 months) than v2 implied (12 months), so the urgency to compete on AI-feature parity has dropped. What killed it: independent reporting framing Atlas/Comet as "struggling to dominate." Pre-announcing creates an expectation gap. What would have to be different to revive: only if Q4 build slips again, in which case the decision is "hold Q4 target" not "pre-announce earlier." Date killed: 2026-05-05 (per research-docs/toby-delta-2026-05-05-v3.md).
onboarding-signup-position-AB — A/B test signup position (beginning vs. end of onboarding)
Killed: "end" variant shipped as the winner (commit bc5e45305, 2026-04-09). Removed 644 lines of dead experiment machinery in the same commit. Durable lesson: ship the winning variant + remove the experiment scaffold in one PR — fragments left in code become Phase 2 plumbing tax. What would have to be different to revive: nothing — this experiment is complete. Date killed (variant shipped): 2026-04-09.
action-0-untracked-wau-instrument — Instrument the "270K untracked WAU" opportunity
Killed: was a mirage. CWS WAU (~380K) is inflated 5-6× because it counts disabled extensions, ghost installs, and multi-device duplicates via Chrome's Omaha update protocol. Cross-verified against Toby DB heartbeats (61,852) and Amplitude (75,123 devices / 41,111 identified users). Real active base is ~62-75K. Why it failed: the premise that the untracked WAU was a conversion opportunity was wrong — those users are not "Toby users we haven't reached"; they're dead accounts and ghost installs. What would have to be different to revive: nothing — the metric was the problem, not the strategy. Date killed: 2026-03-21 (per product/strategy/next-actions.md).
team-plan-as-primary-monetization — Push Team plans as the main upsell path
Killed (implicitly, as a primary strategy): only 79 active yearly Team subs and 96 multi-member paid teams in total vs. 7,058 active yearly Pro subs. Teams is <1% of revenue. Why it failed as a primary axis: B2B sales motion wasn't built and team buyers don't surface this need on public X — they're inside Slack. What would have to be different to revive as primary: dedicated B2B SDR motion + LinkedIn/Slack-community presence. Note: Team plan as a secondary monetization lever (via role-based-paywall-gating) remains live — the kill is on Teams-as-the-headline-bet, not on team features altogether. Date killed (as headline bet): 2026-03-21.
paid-acquisition-channel — Paid ads at $54/yr ARPU
Killed: math doesn't work at current $54/yr ARPU with <5% full-price conversion (per product/strategy/next-actions.md). LTV doesn't cover CAC at any realistic channel cost. What would have to be different to revive: a structurally higher ARPU (Team plan working, role-based gating shipped) AND a conversion path >5% AND a channel under $5 CAC. None present today. Date killed: 2026-03-21.
aggressive-pricing-on-legacy-cohort — Aggressively re-price legacy users to full price
Killed: one-way door, destroys trust, accelerates churn. ThankYouLegacy 12-month auto-transition (78% off → 33% off, a 3× price jump) drove 23.5% cumulative cohort churn — the data we already have on what happens when this lever is pulled too hard. What would have to be different to revive: a transition floor where legacy users see value-delivered-since and the price ramps with the value, not against it. Not currently designed. Date killed: 2026-03-21.