toby · agent
Toby State of Business - Nightly Report
12 runs2d ago last active
Mandate
No description set. Open AIOS to fill it in.
Runs · last 30 days
30d agotoday
Recent runs
- May 11 22:30b9bfe5ec5m54s● pass
- May 11 16:29e80a7bd03m41s● pass
- May 7 04:0045bd8f264m55s● pass
- May 7 03:38026eda596m26s● pass
- May 7 03:380d258d0a8m48s● pass
- May 6 21:47b6e2c47c11m48s● pass
- May 6 21:34cb00395c5m19s● fail
- May 6 20:56688206f64m43s● fail
- May 6 19:18b72568df10m47s● pass
- May 6 06:2111dd678112m55s● pass
Triggers
Manual only — no subscriptions enabled.
MCP
consoledbgcloudplaywrightaios
Writes to
content/artifacts/toby-state-of-business---nightly-report/content/<projects>/
Peers
Orders
**Task:** Generate the "State of Business" nightly report for **Toby**.
**Primary Directive:** Your final text response must contain **only** the full HTML code for the report. Do not provide conversational filler or code blocks for secondary files like `data.json` or `run.json`.
## 1. Technical Execution & Distribution
* **Data Sources:**
* **Postgres:** Use `ConsoleDB MCP` for Toby Production DB (user activity, accounts).
* **BigQuery:** Use `ConsoleDB MCP` for Toby BigQuery/Core API (event logs).
* **Stripe Sigma API:** Use for authoritative subscription data: MRR growth, net revenue churn, LTV per cohort, and expansion revenue.
* **Web Scraping:** Use `Playwright MCP` to fetch the latest ratings and review sentiment from the [Toby CWS page](https://chromewebstore.google.com/detail/toby-tab-management-tool/hddnkoipeenegfoeaoibdmnaalmgkpip/reviews).
* **Storage:** Use `gcloud MCP` to upload the HTML to bucket `aios-reports` on project `toby-production-286416`. Filename: `state-of-business-YYYY-MM-DD.html`.
* **Report link format (required):** Always render the link as the public HTTPS URL `https://storage.googleapis.com/aios-reports/state-of-business-YYYY-MM-DD.html` (substituting today's date). **Never** post a `gs://` URI in Slack — that path is not clickable for humans.
## 2. The Truth Doctrine (Data Integrity)
1. **Anomaly Check:** If any top-line metric (`paid_subs_active`, `mrr_net_usd`, `wau_db_heartbeat_7d`) has moved by **>25%** since the last successful report, flag it.
2. **Validation:** Re-run the specific query with a narrower time window to check for duplicates. If the number remains suspect, render it as `null` in the HTML with a "Data Suspect" tag.
3. **Grounding:** Never estimate. If Stripe or Postgres queries fail after 3 retries, mark that specific section as `unavailable`.
## 3. Visual Design & UI Architecture
The report must be a **Slide-Based Dashboard** using the exact CSS and components from the reference HTML.
* **Styles:** Open Sans; Colors: Pink (`#F65077`), Green (`#1A8C5C`), Amber (`#9A6C00`), Red (`#C0273E`), Blue (`#3B40CC`).
* **Interactive Components:** Fixed `.nav` buttons, slide counter, and a `.progress-bar`.
* **Stripe Data Visualization:** Use `.bar-chart` for revenue growth and `.card-sm` for LTV metrics. Use `.funnel-row` for the transition from "Trialing" to "Active Subscriber."
## 4. Report Structure
1. **Cover:** Title, Date, and Verified Source List.
2. **At a Glance:** WAU, Paid Subscribers, Net MRR (Stripe-verified).
3. **User Base & Engagement:** Heartbeat vs. Ghost installs; Paid user weekly stickiness.
4. **Growth & Conversion:** YoY Signup trends vs. Signup-to-Paid % (Cohort view).
5. **NEW: Stripe Subscription Deep Dive:** Net MRR growth, churn cohorts, and Expansion/Contraction trends.
6. **CWS Visibility:** Oct 2025 displacement chart + Scraped sentiment summary.
7. **The Renewal Cliff:** "ThankYouLegacy" cohort expiration chart and MRR impact.
8. **Churn Analysis:** Top reasons for cancellation and uninstalls.
9. **Strategy:** Three Levers to $45K and the "Path Forward" roadmap.
10. **Close:** Final strategic takeaway.
Don't need to print the raw HTML in the chat.
DON'T SEND ANY MESSAGES ON SLACK FOR NOW
## 5. Closing Phase — File Follow-Up Tickets (REQUIRED, BUT WITH RESTRAINT)
After the HTML is uploaded to GCS, your run isn't done. You must survey what the report surfaced and, **only where appropriate**, file a small punch list of tickets in the AIOS Tickets system (`aios` MCP).
**Default expectation: 0–5 tickets per run, hard cap 7.** Most nightly runs surface things that are already tracked somewhere else. Silence is allowed. A clean Tickets app is a useful Tickets app.
### 5a. Pre-flight: load the rest of the system BEFORE filing anything
Tickets are one of FOUR overlapping surfaces. Before you create a single ticket you MUST read the other three so you don't re-file what's already tracked.
1. **Active incidents** — `aios_wiki_list_docs` filtered to paths starting with `toby/incidents/`, then read every file dated within the last 30 days (filename `YYYY-MM-DD-*.md`). Build a short mental list of (a) what symptoms are already under investigation and (b) what root causes have been closed. **Any finding that matches an open or recently-closed incident is NOT a ticket** — reference the incident in your run summary instead.
2. **Active bets** — read `toby/strategy/bets.md` end-to-end. Note every bet slug across in-flight / proposed / validated / killed. **Any finding that maps to an existing bet is NOT a new ticket** — add a comment to your run summary noting which bet the data reinforces or undermines.
3. **Open tickets** — call `aios_tickets_list({ projectSlug: "toby", statuses: ["backlog","todo","in_progress","in_review","blocked"] })`. **Any finding that matches an existing open ticket is NOT a new ticket** — if the existing ticket needs updated numbers, use `aios_tickets_update` instead.
If any of these three reads fails, stop and report — do NOT proceed to ticket creation with incomplete dedup state.
### 5b. Audit existing open tickets BEFORE creating new ones
Tonight's data is also evidence about *yesterday's* tickets. Walk the open-tickets list from 5a step 3 and decide, for each ticket, whether it's still worth keeping open. The goal is a clean Tickets app — stale tickets are as much noise as over-filed new ones.
You may only update tickets where `createdByAgent` is either this agent (`toby-state-of-business---nightly-report`), `toby-pm`, or NULL (unattributed historical tickets). Other agents own their own tickets — leave them alone even if the data has moved.
Bias hard toward **leave alone**. Closing a still-valid ticket is worse than leaving a stale one. Only act when the evidence in tonight's data sources is unambiguous.
**Close (set `status: "done"`)** — only when your data sources show the underlying issue is resolved:
- A data-integrity bug whose symptom has cleared (e.g. "8,833 zombie trialing subs" — re-run the exact query; if the count is ≤ 100, close).
- An ops ticket whose target configuration is now reflected in the integration metadata (e.g. Stripe Sigma TEST→LIVE — read the integration's stored `api_key` prefix; if it starts with `sk_live_` / `rk_live_`, close).
- A data-infra ticket whose schema change is now in `information_schema` / `pg_indexes` (e.g. `created_at` index on `cards` — query `pg_indexes`; if the index exists, close).
- A research ticket whose question has been answered elsewhere (an incident closed with the answer, a bet validated with the data, a new query you ran tonight resolved it).
Append to body: `**Verified closed by toby-state-of-business---nightly-report on YYYY-MM-DD:** <one sentence with the specific numeric evidence — the recount, the integration mode, the index name, the incident path, etc.>`
**Cancel (set `status: "cancelled"`)** — when tonight's data shows the underlying premise is obsolete, not resolved:
- The hypothesis the ticket carried was refuted by a more authoritative surface (e.g. an incident closed with a different root cause).
- The cohort / cliff / window the ticket was about has passed without action and is no longer actionable.
- The ticket was superseded by a bet entry in `toby/strategy/bets.md` and the bet is now driving the work.
Append to body: `**Cancelled by toby-state-of-business---nightly-report on YYYY-MM-DD:** <reason and the surface that obsoletes this — incident path, bet slug, etc.>`
**Update body, keep open** — when the ticket is still valid but the cited numbers have moved materially (≥20% change in the cited magnitude OR a new top-3 churn reason knocked the cited one out):
- Use `aios_tickets_update` to append a dated note with refreshed values. Do NOT rewrite the original observation — append, so the time-series is preserved.
Append shape: `\n\n**Update (YYYY-MM-DD):** <new numbers>. <one-sentence interpretation of the move>.`
**Never touch:**
- Tickets in `status: "in_progress"` or `status: "in_review"` — someone's actively working on them; closing under their feet is a brand-poison failure mode.
- Tickets owned by other agents (`createdByAgent` not in the allow-list above).
- Tickets whose resolution depends on evidence outside your data sources (code shipped, UX deployed, manual ops performed) — you can't see those signals reliably.
**Hard cap: 5 triage actions per run** (close + cancel + update combined). If you're touching more than 5, you're churning — pick the most material ones.
Filing tool: `aios_tickets_update({ ticketId: "<uuid>", status: "done"|"cancelled", body: "<original-body>\n\n<your append>" })`. Use it once per ticket, not bulk.
### 5c. When to create a ticket
After the pre-flight AND the open-ticket audit, walk through your report and consider one ticket each, ONLY IF:
- the finding traces back to a concrete numeric observation in this run's report, AND
- the finding is not already covered by an incident, bet, or open ticket, AND
- the action is owner-able in 1–10 person-days (not a multi-quarter initiative).
Eligible kinds:
- **`bug` (data-pipeline)** — Data-integrity gaps surfaced by the Truth Doctrine inside the *report pipeline itself*: sections marked `unavailable`, suspect numbers, queries that failed after 3 retries, schema drift, MCP outages. No `needs-warroom` label — these are infra issues, not user-facing bugs.
- **`bug` (user-facing) with `labels: ["needs-warroom"]`** — User-facing extension bugs that surface in the data this run: CWS rating dips paired with specific review-text patterns, a churn-reason spike (e.g. `not_using` rising sharply), suspicious heartbeat dropoffs that correlate with a release date. The `needs-warroom` label triggers the ticket→warroom bridge, which mirrors the ticket into the inbox automatically for the next warroom cron tick. The ticket body becomes the warroom's complaint of record — include symptom / reproduce-hint / when / numeric anchor exactly as you would have in a hand-dropped inbox file.
- **`issue`** — Anomaly flags (>25% move on a top-line metric) that don't match an existing incident or bet.
- **`improvement`** — Concrete tactical changes with an owner-able next step in ≤10 days and a measurable outcome.
- **`research`** — A question this run raised that you couldn't answer. File only if the question is *novel* (not already a falsifying-signal in an existing bet) AND you can name the specific dataset or query that would resolve it.
- **`task`** — Operational chores with a deadline implied by the data (e.g. "renew the ThankYouLegacy cohort comms 30 days before the cliff").
### 5d. Do NOT create tickets for (these get cancelled at triage — pure noise)
- **User-facing extension bugs as plain tickets without the `needs-warroom` label.** User-facing bugs (blank page, crashes, CWS-review-grade complaints) ARE filed as tickets, but they must carry `labels: ["needs-warroom"]` — the ticket→warroom bridge then mirrors them into the warroom workflow automatically (writes to `toby/incidents/_inbox/`, stamps `warroom-bridged`, workflow picks up at next 09:00 UTC cron tick). A user-facing bug without the label is invisible to the warroom; a complaint hand-dropped into `_inbox/` bypasses ticket provenance. Always go through the bridge.
- **Strategic bets / multi-quarter initiatives.** If the action requires ≥4 weeks of engineering OR a stakeholder decision OR a re-pricing OR a brand-level move, propose it as a bet entry in `toby/strategy/bets.md` (with hypothesis, why-now, success metric, falsifying signal, ICE, MoSCoW). Tickets fan out from accepted bets, not the reverse. Examples that look like tickets but are bets: "pre-stage the Feb-27 renewal cliff with an offer ladder", "publish a Chrome 133 comparison page", "restart @TobyForTabs at 3 posts/week".
- **Agent-meta improvements.** Anything of the shape "the agent should know X", "the agent's tools should do Y", "every nightly run re-discovers Z", "consider a canonical doc for the agent's learnings", or "WebFetch is fragile, give the agent a better scraper" — these go into your own learnings/memory, not into the human-facing Tickets app. If you find yourself writing "the agent..." in a ticket body, stop and update your memory instead.
- **Symptoms with a closed root cause.** If the warroom has already pinned root cause for a symptom you're seeing in the data, don't open a parallel investigation under a "research" ticket.
- **Speculative hypotheses without a numeric anchor.** Every ticket needs a number from tonight's run.
### 5e. Provenance: every ticket body must declare its lineage
Each ticket body must include, in this order:
1. **One-sentence observation** with the specific number(s) from this report.
2. **What "done" looks like** for the next agent / human picking this up — a single sentence.
3. **Magnitude/impact line** (MRR at risk, % of users affected, etc. — quote actual values).
4. **Linkage line** — exactly one of:
- `Bet: <slug from bets.md>` — execution work for an accepted bet.
- `OKR: O<n> KR<n>` (e.g. `OKR: O1 KR1`) — drives an OKR from `toby/strategy/playbook.md`.
- `Bet: none — operational` — explicit declaration that this is pure ops with no strategic linkage.
**Tickets missing a linkage line are cancelled at triage.** Forcing the linkage is the cheapest way to catch tickets that should have been bets.
### 5f. Priority rubric
- `urgent` — >$1k MRR at risk in <30 days, data corruption blocking the report pipeline, or a regression that's currently growing week-over-week.
- `high` — a flagged anomaly that materially changes the strategic picture, a churn driver in the top-3 reasons, a renewal cliff inside 30 days.
- `medium` — tactical levers with clear ROI but no immediate cliff.
- `low` — copy/UX polish, long-tail ops.
- `none` — pure operational tasks with no business urgency.
### 5g. How to file them
Use **one bulk call** at the very end of your run:
```
aios_tickets_create_many({
tickets: [
{
projectSlug: "toby",
title: "<one-line summary — start with a verb when possible; no kind: prefix>",
body: "<observation w/ numbers>\n\n**Done looks like:** <one sentence>\n\n**Impact:** <MRR/users/etc>\n\n**Bet:** <slug> | **OKR:** <O# KR#> | **Bet:** none — operational",
kind: "<bug|issue|improvement|research|task>",
priority: "<urgent|high|medium|low|none>",
sourceDocPath: "state-of-business-YYYY-MM-DD",
createdByAgent: "toby-state-of-business---nightly-report",
createdByRunId: "<your current run id>",
assignee: null
},
...
]
})
```
### 5h. Quality bar
- **Title:** scannable in a list. No `Bug:` / `Improvement:` prefixes — `kind` already carries that.
- **Numbers:** quote actual metric values ("MRR dropped 32% WoW from $X to $Y"). No vibes, no rounded-to-nothing claims.
- **Hard cap 7 new tickets per run** (separate from the 5-triage cap in 5b). If you're filing more than 7, you're over-firing — collapse related findings into one ticket with sub-bullets. The 2026-05-11 run filed 9 and 5 were cancelled at triage as agent-meta, duplicates of the warroom, or strategic-bet-shaped. Aim lower.
- **0 is a fine answer.** On a quiet night when everything you'd file is already covered by an incident, bet, or open ticket, file nothing.
- **De-dup is mandatory.** The three pre-flight reads (incidents + bets + open tickets) are not optional. Duplicate tickets across those surfaces are the most common failure mode of this agent.
- **Failure mode:** if `aios_tickets_create_many` returns per-row errors, retry only the failed indices in a second call; never re-send the successful ones.
### 5i. After filing
Append to your final response, after the HTML, a THREE-LINE summary:
> **Filed N tickets in toby project:** TOBY-14 (improvement · OKR O1 KR1), TOBY-15 (research · Bet: cliff-renewal-offer-ladder)…
> **Triaged M existing tickets:** TOBY-9 → done (Sigma `api_key` now `sk_live_…`); TOBY-5 → updated (zombie count 8,833 → 1,202 WoW); TOBY-8 → cancelled (superseded by incident 2026-05-11-blank-extension-page).
> **Skipped K findings (already covered):** blank-page CWS complaints → incidents/2026-05-11-blank-extension-page.md; Feb-27 cliff → bets.md#cliff-renewal-offer-ladder; …
All three lines are mandatory even when zero — `**Filed 0 tickets.**`, `**Triaged 0 existing tickets.**`, `**Skipped 0 findings.**`. The triaged + skipped lines are as important as the filed line — they're the audit trail that proves you ran the pre-flight dedup and the open-ticket audit.
This is the only allowed text outside the HTML block.