artifacts/toby-pm/00036a80-5931-405a-85ab-1e39ee3a545f/incidents-2026-05-12-retention-offers-silent-ingestion.mdIngestion summary — 2026-05-12 retention_offers-silent incident
_Author: toby-pm · Run id: 00036a80-5931-405a-85ab-1e39ee3a545f · 2026-05-12_
Why this lives here, not next to the source:
toby/incidents/is owned by the warroom (toby-incident-coordinator+ doctors + validator). The dashboard agent is read-only there. The user instruction "save next to source" is reconciled against the hard scope rule by writing derivatives to the agent's own artifact dir and surfacing the incident via the dashboard.
Source doc: toby/incidents/2026-05-12-retention-offers-silent.md
What landed
The second canonical warroom incident closed in ~one day after the workflow's
first proof-of-life. Ticket TOBY-6 (urgent) — "retention_offers table silent:
save flow not triggering or not logging" — was triaged through the full
7-wave protocol and transitioned to in_review.
| Wave | Output | When |
|---|---|---|
| 0 — Discernment sweep | Opted-in TOBY-6 (urgent, no agent owner, cross-cutting) | 2026-05-12 03:57 UTC |
| 1 — Investigate | FE doctor finding + BE doctor finding (parallel) | 2026-05-12 04:03 UTC |
| 2 — Synthesise | Coordinator draft with diff | (during the run) |
| 3 — Validate | Verdict validated + medium — caught compile defects in draft diff | 2026-05-12 04:08 UTC |
| 4 — Ship | Skipped (correctly) — medium-confidence does not trigger fix-shipper | n/a |
| 5 — Transition | TOBY-6 → in_review | ~2026-05-12 04:10 UTC |
| 6 — Report | Slack #C0B3FN70MEE consolidated post | (per protocol) |
Diagnosis (one-paragraph)
retention_offers is accept-only by design — the table records only the
"CLAIM DISCOUNT" click. The 17 all-time rows (16 retention_legacy, 1
retention_yearly) are healthy. The ticket's framing ("0 offers issued, 0
accepted") conflates two metrics; from this schema you can only ever measure
accepts. The real funnel is severely top-leaky: 120 cancels → 22
cancellation_reasons (18%) → 1 /retention/accept (0.83%) in the last 30
days. ~82% of cancels never reach the in-app retention modal. Three FE-side
bypass paths explain the leak: (a) Stripe Customer Portal preloaded as a
View link in the in-app Subscription panel, (b) Stripe renewal/receipt
emails containing "Manage subscription" links to the same portal, (c)
team_legacy / team_basic users have no in-app cancel CTA at all —
hasSubscription excludes them, which is the cohort with the worst churn
pressure (Feb-26 ThankYouLegacy renewals).
Why this matters strategically
-
Resolves the long-standing
CancelSubscription.tsxopen question. The dashboard has carried "Retention-discount frontend integration intoCancelSubscription.tsxflagged as pending in worklog.md (Jan 2026); no commit confirms frontend wiring shipped" as an open question for months. The incident doc citesapps/extension/app/components/Modal/Downgrade/CancelSubscription.tsx:643-709as the live retention-dispatch site and:622-627as the liveRETENTION_OFFER_DECLINEDAmplitude wiring — the integration did ship; the funnel just has structural FE-orchestration bypass paths above it. Open question resolves to "integration shipped; funnel structurally leaky". -
Second proof-of-life for the warroom + first medium-confidence verdict. The 2026-05-11 blank-page incident demonstrated the high-confidence ship-path. This run demonstrates the gate working correctly in the opposite direction — validator caught real defects in the draft
diff(log.Info→ctx.Logger.Info, missing nil-guard,team.IDnot in scope, nested struct path corrections), returned the corrected compile-ready replacement, and Wave 4 correctly skipped auto-ship. The escape valve works. -
New strategic finding — structural retention-funnel leak. The 18% reason-rate (cancellations that surface a reason) and 0.83% accept-rate aren't a regression — they've been quietly true for months because the FE preloads the Stripe portal "View" link inside the in-app subscription panel. This is a product-shaped problem that sits upstream of the playbook's monetization bets, and it doesn't show up in any existing O1/O2/O3 KR. The Tier 2 follow-ups (hide Stripe-portal link, configure
flow_dataredirect, give legacy users an in-app cancel CTA) are bigger levers than the Tier 1 patch. -
Tier 4 housekeeping signal — missing GCP secrets. Five
TOBY_RETENTION*secrets don't exist in Secret Manager and pollute every cold start with "failed to access secret version" log lines. System operates correctly on defaults; only side effect is log noise.
Refuted hypotheses
- "It's a regression" — refuted by backend doctor: prod-api revision
prod-api-00427-9p2@ SHA4b0107858stable since 2026-02-02; FE cancel- modal files unchanged since 2026-03-31; commitcbc92a78dwidened retention eligibility rather than narrowing it. - "Backend silently dropping writes" — refuted:
GetRetentionOffereligibility logic is permissive and deterministic; only insert site is exercised correctly; no silent feature flag, A/B switch, or kill-switch. - "Missing secrets are causing skip behaviour" — refuted:
gcp_processor.go:25-32treats missing as("", false)andenvconfigfalls back to struct-tag defaults that match the values actually inretention_offers.coupon_id.
Operator decisions surfaced
- Approve and merge the corrected Tier 1 patch? Validator returned
validated + mediumspecifically because the draftdiffneeded a compile-readiness correction. The corrected version (in the incident doc, fenced as Go) is ready to paste. A Toby Go reviewer (any owner ofapps/api/context/v3/) needs one human review pass. - File Tier 2/3/4 as separate tickets now or wait for Tier 1 telemetry? Pipeline recommendation: file now with stub bodies pointing to this incident; backfill numbers in 14 days once Tier 1 data flows. Avoids work-getting-lost risk.
- Adjust TOBY-6's success metric? "0 offers issued" is unmeasurable
from
retention_offers(cause #1, schema gap). If the team wants to measure "issued", that's Tier 3 work; don't grade Tier 1 against an unmeasurable metric.
Dashboard mutations triggered by this ingestion
| Section | Change |
|---|---|
| TL;DR | Add a sentence on the second warroom run + the structural retention-funnel finding |
| Operations | Add 2026-05-12 retention-offers run to proof-of-life list; record first medium-confidence verdict (Wave 4 correctly skipped) |
| OKRs | Note Tier 2-4 retention findings as upstream lever on the broader monetization picture (not assigned to a KR — surface, don't reorganize) |
| Recent shipments | New canonical incident entry for 2026-05-12 |
| Open questions | Resolve CancelSubscription.tsx wiring question (integration shipped; funnel structurally leaky); add structural retention-funnel leak as a new open question; add Tier 1 Go-review ask |
| Key decisions | Add "medium-confidence verdict ⇒ fix-shipper skips by design" decision |
| Doc index | Add link to toby/incidents/2026-05-12-retention-offers-silent.md |
Memory mutations
- Record
incidents_second_canonical_doc_pathand_at. - Record the medium-confidence path as exercised — second proof-of-life.
- Resolve
pending_review[]entry onCancelSubscription.tsx. - Record the retention-funnel structural finding for cross-cycle re-surfacing.
- Record the Tier-4 missing-secrets observation.
- Refresh
last_audit_at.