Loop patterns
PR Babysitter Loop
**Goal**: Reduce the human time spent herding pull requests through review, CI, rebase, and merge while keeping the human in the judgment seat.
Source recipe
Copyable Markdown
# PR Babysitter Loop
**Goal**: Reduce the human time spent herding pull requests through review, CI, rebase, and merge while keeping the human in the judgment seat.
## Scheduling
**Recommended**:
- `/loop 5m /pr-babysit check` (Grok TUI)
- Equivalent scheduled task or GitHub Action in other environments (every 5–15 minutes during working hours is common).
Many teams run a faster "watcher" loop (2–5m) during active review periods and a slower sweeper overnight.
## Required Skills
- `pr-review-triage` — Understands your project's review norms, required checks, and what "ready to merge" means.
- `minimal-fix` — Produces the smallest possible change that addresses a specific reviewer comment or CI failure.
- `rebase-and-clean` — Safe rebase + conflict resolution patterns for your repo.
## State
Keep a small `pr-babysitter-state.md` (or a Linear board / GitHub project view) with:
- Watched PRs + current status
- Last action taken + outcome
- Human decisions that overrode the loop
Example state entry:
```markdown
- #1234 (feat/auth-refresh)
Status: Changes requested by @reviewer
Last action: Loop proposed minimal diff for comment X
Human decision: Approved the diff, asked for one more test
```
## How the Loop Runs (Typical Cycle)
1. Discover open PRs authored by the team (or all PRs the user cares about).
2. For each PR:
- Run triage skill.
- If CI is red → spawn sub-agent with `minimal-fix` skill to address the failure.
- If review comments exist and are actionable → propose minimal patches.
- If ready (all checks green, approvals present, no blocking comments) → add "ready to merge" label or ping human.
3. For PRs that have been idle too long → suggest close or hand-off.
4. Write concise updates back to the PR and to state.
5. Anything ambiguous or high-risk → surface to human with context.
## Verification Strategy
- Never let the implementer sub-agent mark its own work "done".
- Use a separate verifier sub-agent (maker/checker) (or a stronger model on higher effort) that must explicitly confirm:
- The change addresses the comment/failure.
- No unrelated files were touched.
- Tests/lint still pass in the worktree.
- The loop only proposes; a human (or an explicit "auto-merge" allowlist for very safe cases) actually merges.
## Human Handoff Points
- High-risk refactors
- Changes touching security, payments, auth, or core infrastructure
- When the loop has proposed > N fixes on the same PR without progress
- When the state file shows the same PR surfacing for several days
## Tool-Specific Notes
**Grok Build TUI**:
- The `pr-babysit` skill (if installed) is designed exactly for this.
- Run with `/loop 5m /pr-babysit check`.
- Use worktree isolation for any fix attempts.
- The skill can call `scheduler_delete` on itself when the watchlist is empty.
**Claude Code**:
- Boris Cherny has publicly described running very similar `/loop 5m /babysit` flows.
- Combine with `/goal` for "keep working on this PR until CI is green and no blocking comments remain".
**General**:
- Expose the state file in the repo or a shared doc so the whole team can see what the loop is doing.
- Make the loop's comments on PRs clearly signed (e.g. "🤖 Loop Engineering — PR Babysitter").
## Failure Modes & Mitigations
- **Loop proposes bad fixes** → Strong verifier sub-agent (maker/checker) + human review gate for anything beyond trivial.
- **Infinite rebase loops** → Limit number of automated rebase attempts per PR.
- **Stale state** → The loop should prune closed/merged PRs on every run.
- **Notification fatigue** → Use selective notifications (only when human action is truly required).
## Cost Profile
| Scenario | Tokens/run | Notes |
|----------|------------|-------|
| No-op (empty watchlist) | ~3k | **Target most runs** — exit early |
| Triage pass | ~80k | PR + CI status scan |
| Fix attempt (L2) | ~250k | Worktree + minimal-fix + verifier |
**Cadence**: 5m–15m · **Tier**: high · **Suggested daily cap**: 2M tokens · **Early exit required**
```bash
npx @cobusgreyling/loop-cost --pattern pr-babysitter --cadence 10m --level L1 --conservative
```
High cadence without early-exit burns tokens fast. Use `loop-budget` skill + `loop-run-log.md`.
## Success Metrics
- Average time from "ready for review" to merge (for PRs the loop touched).
- Number of human comments that were purely "LGTM, loop handled the rest".
- Reduction in "can you rebase?" or "CI is red" pings in Slack/Linear.
Start with one team or one repo. Measure for a week. Then expand.