ai-support · agent
AI Support Learning Reviwer
3 runs7d ago last active
Mandate
No description set. Open AIOS to fill it in.
Runs · last 30 days
30d agotoday
Recent runs
- May 6 06:0751a0a27337s● pass
- May 5 19:07e4922f797m26s● pass
- May 4 23:04b563496e19s● fail
Triggers
file
meetings/**Writes to
content/artifacts/ai-support-learning-reviwer/content/<projects>/
Peers
Orders
# Runbook: Review Edited AI Drafts → Pattern Report
## Goal
Pull the latest tickets where the human agent edited or regenerated the AI's
draft, group the edits by pattern, and produce a ranked report with concrete
recommendations.
## Prerequisites (verify, don't assume)
1. Working dir: `/Users/guilhermegiacchetto/support_fd`
2. `support-app/.env` defines:
- `GOOGLE_APPLICATION_CREDENTIALS` → service-account JSON path
- `GOOGLE_CLOUD_PROJECT=toby-production-286416`
- `FIRESTORE_DATABASE_ID=support-app-prod`
3. The service account has `roles/datastore.user` on that database.
4. Node 18+ and `support-app/node_modules` already installed.
5. Helper scripts present at `support-app/scripts/{dump-edited-drafts.mjs,inspect-conv.mjs}`.
If missing, recreate them — see `git log --all -- support-app/scripts/` or
ask the user.
## Step 1 — Fetch
```bash
cd /Users/guilhermegiacchetto/support_fd/support-app
node scripts/dump-edited-drafts.mjs 300 > /tmp/edited-drafts.json
```
`300` = number of most-recent conversations to scan (ordered by `lastUpdated`
desc). Output JSON shape:
```
{ scanned, edited, items: [{ ticketId, lastUpdated, subject, classification,
initialDraft, modRequests[], finalDraft, citedItemIds }, …] }
```
Heuristic: an item is "edited" iff its message thread contains at least one
`modification_request` / `question` from the user OR a `modified_response`
from the assistant. Drafts the agent sent unchanged are excluded — they're
not interesting for this review.
## Step 2 — Sanity-check the dataset
Fast bucketing — pipe the JSON through `node -e` to count by classification
and surface the modification-request strings. Don't try to read 300 raw
docs; read the prompts only:
```bash
node -e 'const d=JSON.parse(require("fs").readFileSync("/tmp/edited-drafts.json","utf8"));
const by={};for(const i of d.items){const c=i.classification||"?";by[c]=(by[c]||0)+1}
console.log("scanned:",d.scanned,"edited:",d.edited);console.log(by);
d.items.forEach(i=>{if(!i.modRequests.length)return;
console.log("\n#"+i.ticketId+" ["+(i.classification||"?")+"]");
i.modRequests.forEach(r=>console.log(" -",r.replace(/\s+/g," ").slice(0,180)))})'
```
## Step 3 — Inspect specific cases
When a pattern is unclear, pull the full thread for one ticket:
```bash
node scripts/inspect-conv.mjs <ticketId>
```
Use this sparingly — at most ~10 tickets. The conversation doc has the
back-and-forth but **not** the actual sent reply or the original ticket
subject. For those, fall back to the Freshdesk URL
(`https://help.gettoby.com/a/tickets/<ticketId>`) or the corrections in
Open Memory's `om_memories` Chroma collection (port 8000 if running locally).
## Step 4 — Categorize
Group every edit into one of these buckets (extend if a new pattern emerges,
but don't proliferate). For each bucket, record: bucket name, frequency,
2–3 ticket IDs as evidence, one-line root cause, one-line fix.
Known buckets observed historically:
1. **Classifier over-blocks** — `needs_human_action` at 96-97% confidence on
tickets the agent then drafts in one prompt. Root cause: classifier treats
billing-adjacent / irreversible-action topics as hard stops.
2. **Hallucinated account state** — AI asserts subscription/discount/charge
facts without consulting ConsoleDB.
3. **Auto-regen self-flips** — `modified_response` appears with no preceding
`modification_request`, often flipping factual claims. Verifier-driven.
4. **Verbosity** — agent prompts contain "shorter", "too long", "short and
sweet".
5. **Tone** — agent prompts contain "warmer", "natural", "calm", or escalate
in frustration.
6. **Agent-as-author** — modRequest is a complete drafted reply prefixed
with "rewrite:" / "use this:" / "redo:". AI is being used as copy editor.
7. **Templateable confirmations** — same canned line repeated across many
tickets after a `chip-action` succeeded.
## Step 5 — Write the report
Write a single markdown file to:
```
/Users/guilhermegiacchetto/support_fd/reports/edit-review-<YYYY-MM-DD>.md
```
(Create the `reports/` directory if missing. Confirm with the user before
committing — these contain customer-facing text.)
Required sections:
1. **Scope** — N scanned, N edited, edit-rate %, date window.
2. **Top patterns** — one section per bucket with: frequency, ≥2 ticket IDs
as evidence, root cause, recommended fix. Rank by frequency × impact.
3. **Recommended fixes table** — # | change | files | effort (small/medium).
4. **Open questions** — anything the agent couldn't determine without a human.
Length cap: 600 lines. If the report exceeds that, tighten — this is meant
to be skimmed, not archived.
## Step 6 — Hand back
In chat, summarize in ≤5 bullets: edit-rate, top 3 patterns by frequency,
top 3 recommended fixes, link to the report file. Do **not** start
implementing fixes — the user decides which to act on.
## Guardrails
- **Never** `gcloud firestore` *delete* / *write*. Read-only.
- **Never** post ticket content to chat platforms or external services.
Customer-facing text from these tickets is sensitive.
- **Never** commit the report file without explicit user approval.
- If `dump-edited-drafts.mjs` errors with `ERR_MODULE_NOT_FOUND`, the script
may have been authored in `import "dotenv/config"` form — patch to use the
hand-rolled .env parser already in the file.