A
AIOS Wiki
read-only · public mirror
Open AIOS
Wikiartifactstoby-pm00036a80-5931-405a-85ab-1e39ee3a545fartifacts/toby-pm/00036a80-5931-405a-85ab-1e39ee3a545f/incidents-2026-05-12-retention-offers-silent-ingestion.md

Ingestion summary — 2026-05-12 retention_offers-silent incident

Hand-authored·5 min read·7 sections·Last edited May 13 by initial import·View history
TL;DR

_Author: toby-pm · Run id: 00036a80-5931-405a-85ab-1e39ee3a545f · 2026-05-12_

Why this lives here, not next to the source: toby/incidents/ is owned by the warroom (toby-incident-coordinator + doctors + validator). The dashboard agent is read-only there. The user instruction "save next to source" is reconciled against the hard scope rule by writing derivatives to the agent's own artifact dir and surfacing the incident via the dashboard.

Source doc: toby/incidents/2026-05-12-retention-offers-silent.md

What landed

The second canonical warroom incident closed in ~one day after the workflow's first proof-of-life. Ticket TOBY-6 (urgent) — "retention_offers table silent: save flow not triggering or not logging" — was triaged through the full 7-wave protocol and transitioned to in_review.

WaveOutputWhen
0 — Discernment sweepOpted-in TOBY-6 (urgent, no agent owner, cross-cutting)2026-05-12 03:57 UTC
1 — InvestigateFE doctor finding + BE doctor finding (parallel)2026-05-12 04:03 UTC
2 — SynthesiseCoordinator draft with diff(during the run)
3 — ValidateVerdict validated + medium — caught compile defects in draft diff2026-05-12 04:08 UTC
4 — ShipSkipped (correctly) — medium-confidence does not trigger fix-shippern/a
5 — TransitionTOBY-6 → in_review~2026-05-12 04:10 UTC
6 — ReportSlack #C0B3FN70MEE consolidated post(per protocol)

Diagnosis (one-paragraph)

retention_offers is accept-only by design — the table records only the "CLAIM DISCOUNT" click. The 17 all-time rows (16 retention_legacy, 1 retention_yearly) are healthy. The ticket's framing ("0 offers issued, 0 accepted") conflates two metrics; from this schema you can only ever measure accepts. The real funnel is severely top-leaky: 120 cancels → 22 cancellation_reasons (18%) → 1 /retention/accept (0.83%) in the last 30 days. ~82% of cancels never reach the in-app retention modal. Three FE-side bypass paths explain the leak: (a) Stripe Customer Portal preloaded as a View link in the in-app Subscription panel, (b) Stripe renewal/receipt emails containing "Manage subscription" links to the same portal, (c) team_legacy / team_basic users have no in-app cancel CTA at allhasSubscription excludes them, which is the cohort with the worst churn pressure (Feb-26 ThankYouLegacy renewals).

Why this matters strategically

  1. Resolves the long-standing CancelSubscription.tsx open question. The dashboard has carried "Retention-discount frontend integration into CancelSubscription.tsx flagged as pending in worklog.md (Jan 2026); no commit confirms frontend wiring shipped" as an open question for months. The incident doc cites apps/extension/app/components/Modal/Downgrade/CancelSubscription.tsx:643-709 as the live retention-dispatch site and :622-627 as the live RETENTION_OFFER_DECLINED Amplitude wiring — the integration did ship; the funnel just has structural FE-orchestration bypass paths above it. Open question resolves to "integration shipped; funnel structurally leaky".

  2. Second proof-of-life for the warroom + first medium-confidence verdict. The 2026-05-11 blank-page incident demonstrated the high-confidence ship-path. This run demonstrates the gate working correctly in the opposite direction — validator caught real defects in the draft diff (log.Infoctx.Logger.Info, missing nil-guard, team.ID not in scope, nested struct path corrections), returned the corrected compile-ready replacement, and Wave 4 correctly skipped auto-ship. The escape valve works.

  3. New strategic finding — structural retention-funnel leak. The 18% reason-rate (cancellations that surface a reason) and 0.83% accept-rate aren't a regression — they've been quietly true for months because the FE preloads the Stripe portal "View" link inside the in-app subscription panel. This is a product-shaped problem that sits upstream of the playbook's monetization bets, and it doesn't show up in any existing O1/O2/O3 KR. The Tier 2 follow-ups (hide Stripe-portal link, configure flow_data redirect, give legacy users an in-app cancel CTA) are bigger levers than the Tier 1 patch.

  4. Tier 4 housekeeping signal — missing GCP secrets. Five TOBY_RETENTION* secrets don't exist in Secret Manager and pollute every cold start with "failed to access secret version" log lines. System operates correctly on defaults; only side effect is log noise.

Refuted hypotheses

  • "It's a regression" — refuted by backend doctor: prod-api revision prod-api-00427-9p2 @ SHA 4b0107858 stable since 2026-02-02; FE cancel- modal files unchanged since 2026-03-31; commit cbc92a78d widened retention eligibility rather than narrowing it.
  • "Backend silently dropping writes" — refuted: GetRetentionOffer eligibility logic is permissive and deterministic; only insert site is exercised correctly; no silent feature flag, A/B switch, or kill-switch.
  • "Missing secrets are causing skip behaviour" — refuted: gcp_processor.go:25-32 treats missing as ("", false) and envconfig falls back to struct-tag defaults that match the values actually in retention_offers.coupon_id.

Operator decisions surfaced

  1. Approve and merge the corrected Tier 1 patch? Validator returned validated + medium specifically because the draft diff needed a compile-readiness correction. The corrected version (in the incident doc, fenced as Go) is ready to paste. A Toby Go reviewer (any owner of apps/api/context/v3/) needs one human review pass.
  2. File Tier 2/3/4 as separate tickets now or wait for Tier 1 telemetry? Pipeline recommendation: file now with stub bodies pointing to this incident; backfill numbers in 14 days once Tier 1 data flows. Avoids work-getting-lost risk.
  3. Adjust TOBY-6's success metric? "0 offers issued" is unmeasurable from retention_offers (cause #1, schema gap). If the team wants to measure "issued", that's Tier 3 work; don't grade Tier 1 against an unmeasurable metric.

Dashboard mutations triggered by this ingestion

SectionChange
TL;DRAdd a sentence on the second warroom run + the structural retention-funnel finding
OperationsAdd 2026-05-12 retention-offers run to proof-of-life list; record first medium-confidence verdict (Wave 4 correctly skipped)
OKRsNote Tier 2-4 retention findings as upstream lever on the broader monetization picture (not assigned to a KR — surface, don't reorganize)
Recent shipmentsNew canonical incident entry for 2026-05-12
Open questionsResolve CancelSubscription.tsx wiring question (integration shipped; funnel structurally leaky); add structural retention-funnel leak as a new open question; add Tier 1 Go-review ask
Key decisionsAdd "medium-confidence verdict ⇒ fix-shipper skips by design" decision
Doc indexAdd link to toby/incidents/2026-05-12-retention-offers-silent.md

Memory mutations

  • Record incidents_second_canonical_doc_path and _at.
  • Record the medium-confidence path as exercised — second proof-of-life.
  • Resolve pending_review[] entry on CancelSubscription.tsx.
  • Record the retention-funnel structural finding for cross-cycle re-surfacing.
  • Record the Tier-4 missing-secrets observation.
  • Refresh last_audit_at.