AI Support Learning Reviwer

3 runs7d ago last active
Mandate

No description set. Open AIOS to fill it in.
Runs · last 30 days

30d agotoday
Recent runs

May 6 06:0751a0a27337s● pass
May 5 19:07e4922f797m26s● pass
May 4 23:04b563496e19s● fail
Triggers

filemeetings/**
Writes to

content/artifacts/ai-support-learning-reviwer/content/<projects>/
Peers

AI Support Wiki Curator
Orders
# Runbook: Review Edited AI Drafts → Pattern Report

## Goal
Pull the latest tickets where the human agent edited or regenerated the AI's
draft, group the edits by pattern, and produce a ranked report with concrete
recommendations.

## Prerequisites (verify, don't assume)
1. Working dir: `/Users/guilhermegiacchetto/support_fd`
2. `support-app/.env` defines:
   - `GOOGLE_APPLICATION_CREDENTIALS` → service-account JSON path
   - `GOOGLE_CLOUD_PROJECT=toby-production-286416`
   - `FIRESTORE_DATABASE_ID=support-app-prod`
3. The service account has `roles/datastore.user` on that database.
4. Node 18+ and `support-app/node_modules` already installed.
5. Helper scripts present at `support-app/scripts/{dump-edited-drafts.mjs,inspect-conv.mjs}`.
   If missing, recreate them — see `git log --all -- support-app/scripts/` or
   ask the user.

## Step 1 — Fetch

```bash
cd /Users/guilhermegiacchetto/support_fd/support-app
node scripts/dump-edited-drafts.mjs 300 > /tmp/edited-drafts.json
```

`300` = number of most-recent conversations to scan (ordered by `lastUpdated`
desc). Output JSON shape:
```
{ scanned, edited, items: [{ ticketId, lastUpdated, subject, classification,
  initialDraft, modRequests[], finalDraft, citedItemIds }, …] }
```
Heuristic: an item is "edited" iff its message thread contains at least one
`modification_request` / `question` from the user OR a `modified_response`
from the assistant. Drafts the agent sent unchanged are excluded — they're
not interesting for this review.

## Step 2 — Sanity-check the dataset

Fast bucketing — pipe the JSON through `node -e` to count by classification
and surface the modification-request strings. Don't try to read 300 raw
docs; read the prompts only:

```bash
node -e 'const d=JSON.parse(require("fs").readFileSync("/tmp/edited-drafts.json","utf8"));
const by={};for(const i of d.items){const c=i.classification||"?";by[c]=(by[c]||0)+1}
console.log("scanned:",d.scanned,"edited:",d.edited);console.log(by);
d.items.forEach(i=>{if(!i.modRequests.length)return;
  console.log("\n#"+i.ticketId+" ["+(i.classification||"?")+"]");
  i.modRequests.forEach(r=>console.log(" -",r.replace(/\s+/g," ").slice(0,180)))})'
```

## Step 3 — Inspect specific cases

When a pattern is unclear, pull the full thread for one ticket:

```bash
node scripts/inspect-conv.mjs <ticketId>
```

Use this sparingly — at most ~10 tickets. The conversation doc has the
back-and-forth but **not** the actual sent reply or the original ticket
subject. For those, fall back to the Freshdesk URL
(`https://help.gettoby.com/a/tickets/<ticketId>`) or the corrections in
Open Memory's `om_memories` Chroma collection (port 8000 if running locally).

## Step 4 — Categorize

Group every edit into one of these buckets (extend if a new pattern emerges,
but don't proliferate). For each bucket, record: bucket name, frequency,
2–3 ticket IDs as evidence, one-line root cause, one-line fix.

Known buckets observed historically:
1. **Classifier over-blocks** — `needs_human_action` at 96-97% confidence on
   tickets the agent then drafts in one prompt. Root cause: classifier treats
   billing-adjacent / irreversible-action topics as hard stops.
2. **Hallucinated account state** — AI asserts subscription/discount/charge
   facts without consulting ConsoleDB.
3. **Auto-regen self-flips** — `modified_response` appears with no preceding
   `modification_request`, often flipping factual claims. Verifier-driven.
4. **Verbosity** — agent prompts contain "shorter", "too long", "short and
   sweet".
5. **Tone** — agent prompts contain "warmer", "natural", "calm", or escalate
   in frustration.
6. **Agent-as-author** — modRequest is a complete drafted reply prefixed
   with "rewrite:" / "use this:" / "redo:". AI is being used as copy editor.
7. **Templateable confirmations** — same canned line repeated across many
   tickets after a `chip-action` succeeded.

## Step 5 — Write the report

Write a single markdown file to:

```
/Users/guilhermegiacchetto/support_fd/reports/edit-review-<YYYY-MM-DD>.md
```

(Create the `reports/` directory if missing. Confirm with the user before
committing — these contain customer-facing text.)

Required sections:

1. **Scope** — N scanned, N edited, edit-rate %, date window.
2. **Top patterns** — one section per bucket with: frequency, ≥2 ticket IDs
   as evidence, root cause, recommended fix. Rank by frequency × impact.
3. **Recommended fixes table** — # | change | files | effort (small/medium).
4. **Open questions** — anything the agent couldn't determine without a human.

Length cap: 600 lines. If the report exceeds that, tighten — this is meant
to be skimmed, not archived.

## Step 6 — Hand back

In chat, summarize in ≤5 bullets: edit-rate, top 3 patterns by frequency,
top 3 recommended fixes, link to the report file. Do **not** start
implementing fixes — the user decides which to act on.

## Guardrails
- **Never** `gcloud firestore` *delete* / *write*. Read-only.
- **Never** post ticket content to chat platforms or external services.
  Customer-facing text from these tickets is sensitive.
- **Never** commit the report file without explicit user approval.
- If `dump-edited-drafts.mjs` errors with `ERR_MODULE_NOT_FOUND`, the script
  may have been authored in `import "dotenv/config"` form — patch to use the
  hand-rolled .env parser already in the file.