# Ingestion — `toby/code-reviews/2026-05-13-10commits.md`

_Source:_ [`toby/code-reviews/2026-05-13-10commits.md`](../../../toby/code-reviews/2026-05-13-10commits.md)
_Author:_ `toby-code-reviewer` agent (slug `da1e2bb3-fbd9-454a-9848-4fe4e05089e3`)
_Run:_ `e5abf485-212b-4a2a-946e-882c6b5b22ec` (still running at ingestion time — 2026-05-13T07:00:18Z, no costUsd yet)
_Cron:_ daily `0 7 * * *` (07:00 UTC)
_File event:_ `created` 2026-05-13 morning

---

## What this is

The first review from a brand-new sibling agent — **`toby-code-reviewer`** — that just came online with a daily 07:00 UTC cron. It walks commits to `main` since the last reviewed SHA, applies a triple-check skill (correctness / quality / security), writes a roll-up findings doc to `toby/code-reviews/`, and files tickets for anything that looks like a real bug, security hole, or missing test.

This is a back-fill review — the agent had no prior reviewed-SHA marker, so it covered the **10 most recent commits on `main`**. The bulk is Jad's onboarding-experiment instrumentation (Session Start heartbeat + planning docs) and the brand-new sandcastle agentic-workflow subsystem under `.sandcastle/`. One Slack-fallback fix from Jp on the CWS review monitor.

Sibling-agent scope rule applies: `toby/code-reviews/` is now owned by the code-reviewer agent. The dashboard agent **does not write** into that folder — derivatives like this ingestion note go in the workspace artifact dir, and the dashboard surfaces the review via Operations / Recent Shipments / Doc Index / Open Questions.

## Commits covered

- `d68726b` fix: gate AuthWrapper on user hydration (Jad, 2026-04-09) — _the proximate trigger of the blank-page incident; review confirms the fix is correct_
- `9d6e8e4` chore: extension `1.13.0`
- `b1fe667` docs: BigQuery schema + analytics caveats
- `40684905` docs: Phase 1 + Phase 2 specs / plans / todos
- `0f3aa38` feat: 4h `Session Start` heartbeat — _medium finding_
- `389556a` fix: Phase 1 halt-threshold baseline
- `e7d00a8` chore: `build:store` script
- `da1ba81` docs: device_id caveat
- `b9bea18` Post fallback Slack message when CWS AI draft fails (Jp, #10) — _low finding_
- `75a09e3` feat: sandcastle agentic workflow (Jad) — _two medium + three low findings_

## Findings — severity-ordered

### medium · quality — `apps/extension/app/hooks/useSessionStart.ts:1-75` (0f3aa38)

75-line hook driving a production analytics event with **no test coverage**. The interesting bits — 4h rate limit, `createdDate → days_since_signup` parse with `Number.isFinite` guard, storage-write-before-send ordering — are exactly the kind of quiet logic that breaks silently and is invisible in BigQuery until someone runs a backfill query. Author's own spec (`tasks/phase1-spec.md` AC2/AC4) calls out that runtime validation in staging is "still required"; a unit test on the hook would have caught a stale `userId` shape or broken rate-limit math without waiting on staging.

**Ticket claimed filed (not yet visible — run in flight).** Suggested action: add `useSessionStart.test.ts` that fakes `chrome.storage.local` and asserts (1) skips within 4h, (2) fires after 4h, (3) `userId` undefined when user is null, (4) `days_since_signup` null for unparseable `createdDate`.

**Why this matters to the dashboard**: Session Start heartbeat is Phase 1 functionality the playbook treats as shipped. Finding adds a quality-debt receipt on already-shipped work. Won't gate Phase 2 but earns its own ticket. Surfacing as Open Question.

### medium · security — `.sandcastle/scan-secrets.mts:1-182` (75a09e3)

The host-side secret scanner is the **only** thing standing between an unattended LLM agent and a pushed credential. It is a regex-on-added-lines walker with a skip-list for lockfiles and binaries — single-pattern, single-line, no inline-base64, no multi-line obfuscation. The author's own comment ("Not designed against an adversarial agent that obfuscates the secret across multiple lines") is honest, but the function has **no test coverage** to even confirm the patterns and skip-list work as intended.

**Ticket claimed filed.** Suggested action: `.sandcastle/scan-secrets.test.mts` with positive + negative fixtures per `PATTERNS` entry, skip-basename coverage, diff-parser line-number counting.

### medium · security — `.sandcastle/main.mts:602,648` (75a09e3)

Two env-var escape hatches (`SANDCASTLE_SKIP_GATES_VERIFY=1`, `SANDCASTLE_SKIP_SECRET_SCAN=1`) fully disable verification + secret-scan gates with nothing more than a `console.warn`. No audit trail ties a pushed branch back to which gate was waived.

**Ticket claimed filed.** Suggested action: record skip flags in the PR body when set, and append to `.sandcastle/audit.log` that survives runs.

### low · quality — `apps/api/context/jobs/cws_review_monitor_jobs.go:172` (b9bea18)

`escapeSlackMrkdwn` handles `@here` and `<links>` but not backticks — a `draftErr.Error()` containing a literal triple-backtick (e.g. an OpenAI body echoed back) breaks the Slack mrkdwn fence and leaks the rest of the message into mrkdwn-parsed mode. Pure formatting hazard, not security. Doc-only; no ticket.

### low · quality — `apps/extension/app/containers/Toby.tsx:300` (d68726b)

Correct fix on the AuthWrapper hydration gate. Comment is crisp. No regression test for the hydration window, but the dimension is small enough that a test would arguably unit-test React's render scheduler. Doc-only.

**Cross-reference**: this commit (`d68726b29`) is the **same one** the dashboard names as the proximate trigger of the blank-extension-page incident (made the unbounded `chrome.storage.local.get` callback in `getUser()` user-visible). The code reviewer agrees the fix itself is correct — the issue was that it widened the gate without bounding the new dependency. PR #12 (commit `06baf0f8a`, open) defends in depth.

### low · correctness — `.sandcastle/main.mts:158-159` (75a09e3)

`EXPECTED_GATES = ['lint', 'build']` — `typecheck` and `test` are intentionally omitted because the extension repo has TS debt and no vitest scaffold on `main`. Reviewer flags this as a lock-step risk: the moment the TS-clean branch lands, `EXPECTED_GATES` must be updated or the agent silently passes gates that should be tightening. Doc-only for now.

### low · quality — `.sandcastle/main.mts:863-867` (75a09e3)

Sandcastle uses `execFileSync` (argv array, no shell) when calling `gh pr create` — no injection. But the trust boundary is interesting: the agent fully controls its own commit subject and that's the PR title. Defensible since PR review catches anything inflammatory, but worth a mention. Doc-only.

## Filed tickets (per the doc — not yet visible in tickets DB as of 07:08Z)

| # | Title | Kind | Severity |
|---|-------|------|----------|
| 1 | `useSessionStart` hook needs unit-test coverage | improvement | medium |
| 2 | `.sandcastle/scan-secrets.mts` needs unit-test coverage | improvement | medium |
| 3 | Sandcastle skip-flags need audit logging | issue | low |

The run started at 2026-05-13T07:00:18Z and was still `running` when this ingestion ran. The "Ticket filed" mentions in the doc are aspirational — the tickets will land before the run completes. Once they do, they should auto-appear in the project ticket list as TOBY-15, -16, -17 (or thereabouts).

## What's new in the codebase that I should be tracking

A **whole new subsystem** landed in commit `75a09e34d` (the latest origin/main HEAD as of last run) under `.sandcastle/`. It's an unattended agentic workflow for slice execution — host-side orchestration that runs lint + build gates and a secret scan against added-only diff lines before opening a PR. Three host-side files of interest:

- `.sandcastle/main.mts` — orchestrator (gates, retry, `gh pr create` invocation)
- `.sandcastle/scan-secrets.mts` — secret pattern matcher
- `.sandcastle/audit.log` — (recommended by reviewer; doesn't exist yet)

This is **separate infrastructure** from the warroom fix-shipper — sandcastle ships agent-authored *slices*, the fix-shipper ships warroom-validated incident *patches*. Two different rails; both end in PR-against-`main`. Dashboard should track this distinction so the operator doesn't conflate them when reviewing PRs.

## Dashboard updates (planned)

- **TL;DR**: add code-review channel went live 2026-05-13 with first review covering 10 commits
- **Operations**: add a fourth "Code Review" subsection (sister to warroom Operations) describing the agent + cron + outputs
- **Recent Shipments**: entry for the agent itself + first review doc
- **Open Questions**:
  - useSessionStart no-test coverage on shipped Phase 1 code
  - Sandcastle skip-flags audit-trail missing (medium-security finding)
  - Sandcastle EXPECTED_GATES lock-step risk
- **Key Decisions**: code-review channel design (review + file tickets, never patch)
- **Doc Index**: add `toby/code-reviews/2026-05-13-10commits.md`

## Note on the d68726b29 / PR #12 connection

The code-reviewer independently looked at `d68726b29` and concluded the fix is correct — useful triangulation against the warroom's finding that the commit "widened the gate without bounding the new dependency". The code-reviewer was reviewing a different concern (test coverage on the hydration window) and didn't have the warroom's downstream context. The two reviews don't conflict; they cover orthogonal angles on the same commit.

## Scope-rule check

- Source doc is in `toby/code-reviews/` — owned by `toby-code-reviewer` agent.
- I do NOT write into that folder.
- This ingestion lives in my workspace artifact dir.
- Dashboard rewrite is in scope (`toby/00-state-of-the-project.md` is my doc).