# Ingestion summary — 2026-05-12 retention_offers-silent incident

_Author: toby-pm · Run id: 00036a80-5931-405a-85ab-1e39ee3a545f · 2026-05-12_

> Why this lives here, not next to the source: `toby/incidents/` is owned by
> the warroom (`toby-incident-coordinator` + doctors + validator). The dashboard
> agent is read-only there. The user instruction "save next to source" is
> reconciled against the hard scope rule by writing derivatives to the agent's
> own artifact dir and surfacing the incident via the dashboard.

**Source doc**: [`toby/incidents/2026-05-12-retention-offers-silent.md`](../../toby/incidents/2026-05-12-retention-offers-silent.md)

## What landed

The second canonical warroom incident closed in ~one day after the workflow's
first proof-of-life. Ticket TOBY-6 (urgent) — "`retention_offers` table silent:
save flow not triggering or not logging" — was triaged through the full
7-wave protocol and transitioned to `in_review`.

| Wave | Output | When |
|---|---|---|
| 0 — Discernment sweep | Opted-in TOBY-6 (urgent, no agent owner, cross-cutting) | 2026-05-12 03:57 UTC |
| 1 — Investigate | FE doctor finding + BE doctor finding (parallel) | 2026-05-12 04:03 UTC |
| 2 — Synthesise | Coordinator draft with `diff` | (during the run) |
| 3 — Validate | Verdict `validated + medium` — caught compile defects in draft `diff` | 2026-05-12 04:08 UTC |
| 4 — Ship | **Skipped (correctly)** — medium-confidence does not trigger fix-shipper | n/a |
| 5 — Transition | TOBY-6 → `in_review` | ~2026-05-12 04:10 UTC |
| 6 — Report | Slack `#C0B3FN70MEE` consolidated post | (per protocol) |

## Diagnosis (one-paragraph)

`retention_offers` is **accept-only by design** — the table records only the
"CLAIM DISCOUNT" click. The 17 all-time rows (16 `retention_legacy`, 1
`retention_yearly`) are healthy. The ticket's framing ("0 offers issued, 0
accepted") conflates two metrics; from this schema you can only ever measure
accepts. The real funnel is severely top-leaky: **120 cancels → 22
`cancellation_reasons` (18%) → 1 `/retention/accept` (0.83%) in the last 30
days.** ~82% of cancels never reach the in-app retention modal. Three FE-side
bypass paths explain the leak: (a) Stripe Customer Portal preloaded as a
`View` link in the in-app Subscription panel, (b) Stripe renewal/receipt
emails containing "Manage subscription" links to the same portal, (c)
`team_legacy` / `team_basic` users have **no in-app cancel CTA at all** —
`hasSubscription` excludes them, which is the cohort with the worst churn
pressure (Feb-26 ThankYouLegacy renewals).

## Why this matters strategically

1. **Resolves the long-standing `CancelSubscription.tsx` open question.**
   The dashboard has carried "Retention-discount frontend integration into
   `CancelSubscription.tsx` flagged as pending in worklog.md (Jan 2026); no
   commit confirms frontend wiring shipped" as an open question for months.
   The incident doc cites `apps/extension/app/components/Modal/Downgrade/CancelSubscription.tsx:643-709`
   as the live retention-dispatch site and `:622-627` as the live
   `RETENTION_OFFER_DECLINED` Amplitude wiring — the integration **did
   ship**; the funnel just has structural FE-orchestration bypass paths
   above it. Open question resolves to "integration shipped; funnel
   structurally leaky".

2. **Second proof-of-life for the warroom + first medium-confidence verdict.**
   The 2026-05-11 blank-page incident demonstrated the high-confidence
   ship-path. This run demonstrates the gate working correctly in the
   opposite direction — validator caught real defects in the draft `diff`
   (`log.Info` → `ctx.Logger.Info`, missing nil-guard, `team.ID` not in
   scope, nested struct path corrections), returned the corrected
   compile-ready replacement, and Wave 4 correctly skipped auto-ship.
   The escape valve works.

3. **New strategic finding — structural retention-funnel leak.**
   The 18% reason-rate (cancellations that surface a reason) and 0.83%
   accept-rate aren't a regression — they've been quietly true for months
   because the FE preloads the Stripe portal "View" link inside the in-app
   subscription panel. This is a product-shaped problem that sits upstream
   of the playbook's monetization bets, and it doesn't show up in any
   existing O1/O2/O3 KR. The Tier 2 follow-ups (hide Stripe-portal link,
   configure `flow_data` redirect, give legacy users an in-app cancel CTA)
   are bigger levers than the Tier 1 patch.

4. **Tier 4 housekeeping signal — missing GCP secrets.**
   Five `TOBY_RETENTION*` secrets don't exist in Secret Manager and pollute
   every cold start with "failed to access secret version" log lines.
   System operates correctly on defaults; only side effect is log noise.

## Refuted hypotheses

- **"It's a regression"** — refuted by backend doctor: prod-api revision
  `prod-api-00427-9p2` @ SHA `4b0107858` stable since 2026-02-02; FE cancel-
  modal files unchanged since 2026-03-31; commit `cbc92a78d` *widened*
  retention eligibility rather than narrowing it.
- **"Backend silently dropping writes"** — refuted: `GetRetentionOffer`
  eligibility logic is permissive and deterministic; only insert site is
  exercised correctly; no silent feature flag, A/B switch, or kill-switch.
- **"Missing secrets are causing skip behaviour"** — refuted: `gcp_processor.go:25-32`
  treats missing as `("", false)` and `envconfig` falls back to struct-tag
  defaults that match the values actually in `retention_offers.coupon_id`.

## Operator decisions surfaced

1. **Approve and merge the corrected Tier 1 patch?** Validator returned
   `validated + medium` specifically because the draft `diff` needed a
   compile-readiness correction. The corrected version (in the incident
   doc, fenced as Go) is ready to paste. **A Toby Go reviewer (any owner
   of `apps/api/context/v3/`) needs one human review pass.**
2. **File Tier 2/3/4 as separate tickets now or wait for Tier 1 telemetry?**
   Pipeline recommendation: file now with stub bodies pointing to this
   incident; backfill numbers in 14 days once Tier 1 data flows. Avoids
   work-getting-lost risk.
3. **Adjust TOBY-6's success metric?** "0 offers issued" is unmeasurable
   from `retention_offers` (cause #1, schema gap). If the team wants to
   measure "issued", that's Tier 3 work; don't grade Tier 1 against an
   unmeasurable metric.

## Dashboard mutations triggered by this ingestion

| Section | Change |
|---|---|
| TL;DR | Add a sentence on the second warroom run + the structural retention-funnel finding |
| Operations | Add 2026-05-12 retention-offers run to proof-of-life list; record first `medium-confidence` verdict (Wave 4 correctly skipped) |
| OKRs | Note Tier 2-4 retention findings as upstream lever on the broader monetization picture (not assigned to a KR — surface, don't reorganize) |
| Recent shipments | New canonical incident entry for 2026-05-12 |
| Open questions | **Resolve** `CancelSubscription.tsx` wiring question (integration shipped; funnel structurally leaky); **add** structural retention-funnel leak as a new open question; **add** Tier 1 Go-review ask |
| Key decisions | Add "medium-confidence verdict ⇒ fix-shipper skips by design" decision |
| Doc index | Add link to `toby/incidents/2026-05-12-retention-offers-silent.md` |

## Memory mutations

- Record `incidents_second_canonical_doc_path` and `_at`.
- Record the medium-confidence path as exercised — second proof-of-life.
- Resolve `pending_review[]` entry on `CancelSubscription.tsx`.
- Record the retention-funnel structural finding for cross-cycle re-surfacing.
- Record the Tier-4 missing-secrets observation.
- Refresh `last_audit_at`.
