retention_offers silent — save flow not triggering or not logging
TL;DR
retention_offers is accept-only by design — the table records the click on "CLAIM DISCOUNT" and nothing else. The ticket's framing ("0 offers issued, 0 accepted") conflates two different things; from this table you can only ever measure accepts. The 17 all-time rows are 17 real, healthy acceptances (16 retention_legacy, 1 retention_yearly).
The real funnel is severely top-leaky: 120 cancels → 22 cancellation_reasons (18%) → 1 retention/accept (0.83%) in the last 30 days. ~82% of cancels never reach the in-app retention modal at all. Three structural bypass paths explain the leak:
- Stripe Customer Portal preloaded as a
Viewlink inside the in-app Subscription panel. - Stripe renewal-notice / receipt emails contain "Manage subscription" links to the same portal.
team_legacy/team_basicusers have no in-app cancel CTA at all —hasSubscriptionexcludes them.
Not a regression. Prod-api revision prod-api-00427-9p2 has been stable since 2026-02-02; FE cancel-modal files haven't changed since 2026-03-31 (cbc92a78d widened eligibility). The funnel has been structurally leaky for months.
Tier 1 fix shipping-ready (validator-vetted, awaiting Go-reviewer sign-off): a 10-LOC zap structured log line in GetRetentionOffer so we can measure "eligible offers shown" in Cloud Logging without a schema change. Tier 2-4 (Stripe Portal flow_data, legacy CTA, schema columns, missing secrets) are explicit follow-up tickets — not in this incident's scope.
Symptom
- Last 30 days: 0 new
retention_offersrows (well, 1, written 23h before this incident). - Last 30 days: 120 subscriptions transitioned to
canceled. - Last 30 days: 22
cancellation_reasonsrows. - All-time: 17 rows in
retention_offers. 16 areretention_legacy. Exactly 1 isretention_yearlyand it's from today. - Source ticket: TOBY-6 ("retention_offers table silent: save flow not triggering or not logging").
Root cause
1. Schema is accept-only
apps/api/data/migrations/V71__retention_offers.up.sql:1-11 — table columns are id, team_id, user_id, subscription_id, coupon_id, interval, accepted_at, created_at. No status, no offered_at, no declined_at. The only insert site is apps/api/context/v3/subscription_context.go:697-710, which runs only inside AcceptRetentionOffer. So every row in the table is a confirmed user-click on the "CLAIM DISCOUNT" button. Shows-without-accept and declines write nothing.
The ticket's metric definition ("0 offers issued") is unmeasurable from this table — full stop. Until we add an offered/declined surface, we can only ever measure accepts. (Validator independently re-queried information_schema.columns and the migration file. Confirmed.)
2. ~82% of cancels bypass the in-app retention modal
Live HTTP logs (prod-api, last 12 days): 2 requests to /retention-offer total. The full happy-path sequence has only fired for ONE team in the entire 30-day window (team ce2cc1ac-…-90dfb, 2026-05-12 23:17 UTC — reason → /retention-offer eligible:true → /retention/accept → row written → cooldown blocks the retry 17s later). Backend behaved exactly as designed.
The remaining ~98 cancels in 30d reach Toby only as Stripe customer.subscription.updated webhooks, with no cancellation_reasons row written. Three bypass paths (all FE-side):
| # | Path | Code site |
|---|---|---|
| a | <Link href={stripeUrl} target="_blank">View</Link> in the in-app Subscription panel | apps/extension/app/components/Modal/OrgSettings/Subscription.tsx:51-62, 192-208 (stripe-portal URL preloaded on every panel mount) |
| b | "Manage subscription" links in Stripe renewal / receipt emails | off-product; same portal as path (a) |
| c | Legacy users have NO in-app cancel CTA at all | apps/extension/app/components/Modal/OrgSettings/Subscription.tsx:41-43: hasSubscription = !!team?.paymentCustomerID && !['team_legacy', 'team_basic'].includes(team.accessRole). The cancel link renders only inside {hasSubscription && (...)} |
The compounding effect on path (c): February 2026 was peak churn driven by ThankYouLegacy renewals (per product/metrics/surveys/churn-survey-analysis.md). The cohort with the worst churn pressure is structurally invisible to retention.
The cancel handler at apps/api/context/v3/subscription_context.go:75-151 goes straight to PaymentSvc.CancelSubscription (line 138, returns Stripe portal URL). No retention check. No offer write. The retention flow is entirely FE-orchestrated; backend has no chance to intervene.
3. Of users who DO reach the modal, very few accept
Of the 22 cancellation_reasons rows in 30d, only 1 led to /retention/accept. The remaining 21 either (a) had the FE skip the GET /retention-offer call, (b) saw the offer and clicked "Cancel anyway", or (c) abandoned. Backend can't disambiguate because of cause #1 (schema gap). FE's RETENTION_OFFER_DECLINED Amplitude event exists (CancelSubscription.tsx:622-627) but isn't visible from prod-api telemetry.
Historic accept-to-reason ratio: 4-10% across Jan-May 2026 (134 reasons / 10 accepts in Feb-26 = 7.5%). The 30d window isn't a regression — just a smaller absolute sample of the long-running funnel reality.
Striking pattern
16 of 17 all-time accepts are retention_legacy (legacy-user discount-vs-renewal-hike). Only one retention_yearly accept ever (today's row). Either the FE doesn't render the yearly-renewal-offer branch (subscription_context.go:475-497), or non-legacy yearly users decline at ~100%. Worth a separate look.
What this is NOT
- Not a backend bug.
GetRetentionOffereligibility logic is permissive and deterministic.IsEligibleForRetention()returns true for all four cancellation enum values. Cooldown (12mo) and subscription-age (30d) gates work correctly when they fire. No silent feature flag, A/B switch, or kill-switch. No backend deploys since 2026-02-02. - Not a recent FE regression. Cancel-modal files unchanged since 2026-03-31 (
cbc92a78dwidened eligibility by removing theRETENTION_ELIGIBLE_REASONSfilter). The funnel has been structurally leaky for months — Feb-26's higher absolute traffic just disguised it. - Not a missing-secret-causing-skip. Five
TOBY_RETENTION*secrets don't exist in GCP Secret Manager, butgcp_processor.go:25-32treats missing as("", false)and envconfig falls back to struct-tag defaults atconfig.go:185-192(which match the values actually inretention_offers.coupon_id). System operates correctly on defaults — only side effect is cold-start log spam. (Filed as Tier 4 / separate ticket; out of this incident's scope.)
Fix
Tier 1 — Backend instrumentation patch (validator-vetted, awaiting human review)
Insert a single structured log line in GetRetentionOffer at apps/api/context/v3/subscription_context.go between the eligibility evaluation and the response build (between L597 and L599):
if result.Eligible && result.Offer != nil {
ctx.Logger.Info("retention_offer_eligible",
zap.String("teamID", teamID),
zap.String("userID", userID),
zap.String("offerType", string(result.Offer.OfferType)),
zap.String("couponID", result.Offer.CouponID),
)
}
This replaces the draft diff in the original synthesis. The validator caught three defects in the original (log.Info instead of ctx.Logger.Info, team.ID not in scope, OfferType/CouponID nested under result.Offer.*, missing nil-guard) and produced the compile-ready replacement above. Conventions follow the existing zap usage in the same file (L652-656, L709).
Why it's the right scope for an automated fix: logging-only, additive, no behaviour change, no schema change, no user impact, ~6 LOC, 1-commit rollback. Reverting is a no-op. Why it's not auto-shipping: validator returned confidence: medium and explicitly recommended human reviewer sign-off before merge. Per Wave 4 spec, medium confidence skips the fix-shipper. The corrected diff above is ready to paste into a PR; a Toby Go reviewer (any owner of apps/api/context/v3/) just needs to approve.
Tier 2 — Product / FE follow-ups (filed as separate tickets, NOT shipped here)
| # | Question | Code site |
|---|---|---|
| 2a | Should Subscription.tsx hide the "View invoices" Stripe-portal link until after the retention modal, OR configure Stripe flow_data to redirect "Cancel plan" back to Toby's modal? | apps/extension/app/components/Modal/OrgSettings/Subscription.tsx:51-62, 192-208 |
| 2b | Should hasSubscription stop excluding team_legacy / team_basic so the cohort with worst churn pressure gets an in-app cancel CTA and a retention opportunity? | apps/extension/app/components/Modal/OrgSettings/Subscription.tsx:41-43 |
| 2c | Why are zero non-legacy yearly users accepting retention_yearly? Is the FE rendering the branch at all? | apps/api/context/v3/subscription_context.go:475-497 and apps/extension/app/components/Modal/Downgrade/RetentionOffer.tsx |
Tier 3 — Schema / analytics (NOT shipped here)
- Add a
retention_offer_viewstable OR addstatus/offered_at/declined_atcolumns toretention_offers. Makes "issued" measurable from the DB independent of logs. - Wire Amplitude
RETENTION_OFFER_SHOWN/RETENTION_OFFER_DECLINEDevents into the BI pipeline so funnel is visible without backend changes.
Tier 4 — Housekeeping (separate ticket worthy, NOT shipped here)
Five missing TOBY_RETENTION* secrets in GCP Secret Manager (TOBY_RETENTIONMINSUBSCRIPTIONDAYS, TOBY_RETENTIONCOOLDOWNMONTHS, TOBY_RETENTIONLEGACYYEARLYPRICE, TOBY_RETENTIONCOUPONLEGACY, TOBY_RETENTIONCOUPONYEARLY). Either create them with the current defaults, or remove the lookup entirely. Today they pollute every cold start with 5 "failed to access secret version" log lines.
Verify plan (Tier 1)
- Apply the corrected diff (above) at L598 of
apps/api/context/v3/subscription_context.go. Confirmgo build,go vet, and the package's existing tests still pass. - Deploy to prod-api as a normal release (no special migrations, no flag).
- Wait 24-48 hours. Cancel-flow traffic is sparse — historical baseline is ~22 cancellation_reasons / 30d → ~0.7/day → ~1-3 retention_offer_eligible events expected in 48h.
- Cloud Logging query in
toby-production-286416:resource.labels.service_name="prod-api" AND jsonPayload.message="retention_offer_eligible" AND timestamp >= "<deploy timestamp>" - Expectation: at least 1-3 events over 48h. If zero, that's also a useful signal — either the FE isn't calling
/retention-offerafter reason submit (Tier 2 question) or all callers are gettingeligible:false(cooldown / age gate, which the existing handler already covers). - Compute the FE funnel ratio over the same window:
count(retention_offer_eligible events)vscount(cancellation_reasons.created_at)rows. This is the first datapoint we'll have to disambiguate "modal-renders-but-decline" from "modal-never-calls-the-endpoint" — i.e. the Tier 2c question above.
After 14 days of Tier 1 telemetry flowing, file the Tier 2/Tier 3 tickets with real numbers attached.
Operator decisions to surface
- Approve and merge the Tier 1 patch? Validator returned
validated + mediumspecifically because the corrected diff needs a Go reviewer eyes-on before automated ship. The patch above is compile-ready by validator confirmation — needs one human review pass. - File Tier 2-4 as separate tickets now or wait for Tier 1 telemetry? Recommended: file the tickets now with stub bodies pointing to this incident; backfill numbers in 14 days when Tier 1 data flows. That way they don't get lost.
- Adjust the ticket's success metric? "0 offers issued" is unmeasurable from
retention_offers(cause #1, schema gap). If the team wants to measure "issued", that's Tier 3 work; don't grade Tier 1 against an unmeasurable metric.
Open questions
- None blocking diagnosis or ship. Validator's compile-readiness objection is fully resolved by the corrected diff above.
- Awaiting human approval for the Tier 1 PR. After approval, this can be re-routed through the fix-shipper on a future tick.
Citations
- Frontend finding:
artifacts/toby-frontend-doctor/c1bf20e9-d112-429a-817a-986e7a08ce2f/finding.md - Backend finding:
artifacts/toby-backend-doctor/f8fd14fa-77ec-4906-8cbd-0dec5f88d26d/finding.md - Synthesis draft (preserved):
artifacts/toby-incident-coordinator/889c2366-0fe8-45ee-afb0-d293f41bd015/synthesis-draft.md - Validator's verdict + corrected diff:
artifacts/toby-incident-validator/db1a3c0a-b500-432d-a579-658f01657186/validation.md - Discernment audit (Wave 0 sweep):
artifacts/toby-incident-coordinator/889c2366-0fe8-45ee-afb0-d293f41bd015/discernment-2026-05-12.md - Source ticket: TOBY-6 (id
a4c30893-a56e-4b35-8a99-e462290abe15), priority urgent. - Prod-api revision (stable):
prod-api-00427-9p2@ SHA4b0107858e706c904e6cf2841fbcbf81a1e2f94fsince 2026-02-02. - DB connection (read-only spot-checks by validator):
be55a66b-c905-4759-9ce1-a97785bb69e6. - Migration:
apps/api/data/migrations/V71__retention_offers.up.sql:1-15. - Only insert site:
apps/api/context/v3/subscription_context.go:697-710. - Cancel handler (no retention):
apps/api/context/v3/subscription_context.go:75-151. - Eligibility predicate:
apps/api/models/models/cancellation_reason.go:33-37. - FE cancel entry:
apps/extension/app/components/Modal/OrgSettings/Subscription.tsx:228-240. - FE retention dispatch:
apps/extension/app/components/Modal/Downgrade/CancelSubscription.tsx:643-709. - FE retention modal:
apps/extension/app/components/Modal/Downgrade/RetentionOffer.tsx.
Timeline
| Time (UTC) | Event |
|---|---|
| 2026-02-02 | prod-api-00425 deployed with current SHA. No further code deploys to prod-api since. |
| 2026-03-31 | FE commit cbc92a78d widens retention eligibility (removes RETENTION_ELIGIBLE_REASONS filter). Expected to raise offer volume, not lower it. |
| 2026-04-09 | FE commit d68726b29 lands (the blank-extension-page regression). May have contributed to fewer users reaching Org Settings → Cancel — separate incident, already shipped/triaged. |
| 2026-05-11 22:34 | TOBY-6 filed by toby-state-of-business---nightly-report based on state-of-business-2026-05-18.html. |
| 2026-05-12 03:57 | Warroom (this run) opted-in TOBY-6 via Wave 0 discernment sweep — urgent priority, no agent owner, cross-cutting. |
| 2026-05-12 04:03 | Both doctors converged: backend disconfirms hypothesis B (writes are healthy), frontend reframes hypothesis A (modal exists, but bypass paths dominate). |
| 2026-05-12 04:08 | Validator returned validated + medium. Diagnosis sound; Tier 1 patch needed a 5-min compile-readiness correction. |
| 2026-05-12 ~04:10 | This doc published. Status: closed (diagnosis); ship_state: awaiting_human_review. Source ticket TOBY-6 → in_review. |