1
0
Files
tutorials/agentic-setup/agentic-setup-bootstrap.md

228 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Agentic Environment — Bootstrap Instructions for Claude Code
> **You are Claude Code. A user has handed you this document to set up a personal, multi-domain "agentic environment" from scratch.**
> Your job is to (1) understand the architecture below, (2) **interview the user** to capture *their* domains, stack, conventions, and guardrails, then (3) **scaffold a clean, greenfield setup** for them — committing as you go.
>
> This is a *clean-room* design. Do **not** import any one person's accumulated tech debt. Build it right from day one: committed + backed up, self-describing, no dead config, security and governance baked in. Adapt every concrete detail to the user via the interview — the patterns are fixed, the specifics are theirs.
---
## 0. How to use this document
**User:** place this file where your Claude Code can read it, then say: *"Read agentic-setup-bootstrap.md and set up my agentic environment."*
**Claude Code:** Work through this in order — **Principles → Interview → Scaffold → Phased rollout**. Do not scaffold before interviewing. Show the user a plan and get approval before writing files. Commit after each coherent step. Never invent the user's conventions — ask.
---
## 1. What you're building (the mental model)
A **source-of-truth git repo** (call it `agentic-dev/` or similar) that is **symlinked into `~/.claude/`**, containing:
```
agentic-dev/
├── agents/<name>.md # expert subagents (focused roles)
├── skills/<name>/SKILL.md # auto-triggering capabilities
├── commands/<name>.md # slash commands (explicit workflows)
├── agent-memory/<name>/ # each agent's persistent memory (MEMORY.md index + notes)
├── shared-knowledge/ # facts referenced by multiple agents
├── hooks/ # lifecycle scripts (session start, capture, etc.)
├── githooks/pre-commit # secret-scan gate
├── sync.sh / uninstall.sh # symlink installer/remover
├── manifest.* (generated) # the system's accurate self-description
├── CLAUDE.md # global governance rules (always loaded)
└── README.md
```
The repo is the truth; `~/.claude/` is a set of symlinks into it. Edit the repo → it's live. This makes the whole environment **versioned, reviewable, revertible, and portable across machines**.
### The five pillars
1. **Expert subagents** — narrow, role-scoped agents with their own memory and tool permissions, instead of one do-everything prompt.
2. **Layered memory***semantic* (current facts), *episodic* (what happened / a decision journal), *procedural* (skills/checklists), with explicit time-awareness so it doesn't silently rot.
3. **Governance** — hard rules that always load: no silent behavioral change, secret hygiene, human-approval gates for anything risky.
4. **The self-improving loop** — agents capture friction → a periodic human-triggered review coaches them → they get sharper. Compounding capability, no daemon.
5. **Grounding** — agents observe reality (repos, and optionally live systems via MCP) rather than guessing, with the blast radius the user is comfortable with.
---
## 2. Design principles (these are fixed — do not compromise them)
- **Committed + backed up from commit #1.** `git init`, add a remote, push. Never let the source-of-truth live only as uncommitted local files.
- **Self-describing, never lying.** Generate the manifest (counts/inventory of agents/skills/commands) from disk; never hand-maintain a number that will drift.
- **No dead config.** Every agent in `agents/` must be a registered, reachable agent type. Every skill must trigger. If it isn't used, remove it — don't accrete.
- **Read-only by default; write is a privilege.** Agents that *analyze* (auditors, critics, planners, the reviewer) get non-mutating tools only. Agents that *change things* get write tools and a tighter leash.
- **Reviewer ≠ applier.** The agent that decides *what* to change never also *applies* it unattended. Propose → human approves the literal diff → a separate step applies + commits.
- **Memory captures the non-derivable.** Don't store what's re-readable from code/docs. Store hard-won facts: quirks, gotchas, decisions, "we tried X, it failed because Y." Notes record their own deprecations.
- **Episodic ≠ boot-loaded.** The decision/retro journal is read *on demand* (at review), never loaded into every agent's startup context — or it bloats and dilutes the high-signal rules.
- **Human-triggered autonomy only** (per this user's class of setup): nudges and scheduled *reminders* are fine; unattended self-modifying action is not.
- **Governance > convenience.** When a fix would also change behavior the user can observe, stop and surface options; never silently "improve."
### Anti-patterns to actively avoid (the tech debt that kills these systems)
- Source-of-truth repo with **no remote** and a pile of uncommitted changes → one bad `git` command from total loss.
- **Hand-maintained snapshots** (topology docs, project lists, counts) that rot silently while reality moves on.
- **Self-description that lies** ("7 agents" when there are 8) → mis-routes work, hides capabilities.
- **Orphaned/dead agents** referenced everywhere but never actually invocable.
- **Capture without review** → a journal that grows forever and is never read = "dormancy with a token tax."
- **Self-graded retros** ("everything went great") = landfill. The gold is the *human's correction*.
---
## 3. The interview (run this WITH the user before building anything)
Capture answers into `shared-knowledge/profile.md` as you go. Teach while you ask — the user may be newer than this document assumes.
**A. Domains & priorities**
1. Which domains do you want agents for? (e.g. homelab/infra ops, software development, UI/UX, product/idea development, research/consultation, writing, data.) Rank them.
2. For each domain: what does a *good outcome* look like? What recurring task or pain motivated this?
**B. Stack & conventions (per domain)**
3. What's your actual stack in each domain? (OS, languages/frameworks, infra platform, container runtime, git host, CI, deployment style.) — *Do not assume; this is where their setup differs from any template.*
4. Any house conventions you want enforced? (file layout, naming, commit format, code style, where things get deployed.)
5. Where does your code/config live (paths, repos)? What should agents be allowed to read vs. never touch?
**C. Risk & access posture**
6. For infra/ops: do you want agents to **act** (make changes, run commands, SSH) or only **advise** (read + propose)? Start advisory unless the user is confident.
7. Do you want any **live-system grounding** (MCP servers for metrics/SSH/browser/GitHub), or repo-only to start? Default repo-only; add grounding deliberately.
8. Always-on / scheduled autonomy: usually **no** — confirm. Reminders and human-triggered reviews only?
**D. Guardrails & secrets**
9. Your secrets policy: which files/paths must never be read? How should a leaked secret be handled if one is seen?
10. Any hard "never do this" rules? (e.g. never modify the host package manager, never force-push, never change behavior without asking.)
**E. Cost/quality**
11. What's your model budget posture? (Subscription vs API; tolerance for using the strongest model.) This sets the model tiering (§5).
**F. Self-improving loop**
12. Do you want the learning loop (capture friction → periodic review → coaching)? Explain the dormancy risk honestly. If yes, what review cadence feels realistic?
Summarize the captured profile back to the user and confirm before scaffolding.
---
## 4. Scaffold procedure (build in this order; commit after each)
### Step 1 — Foundation
- `git init`; create the repo structure (§1).
- Add a **remote** and push immediately. Confirm the remote is **private** if anything sensitive may land there.
- `.gitignore`: exclude the episodic journal (`agent-memory/_journal/`), local overrides, OS cruft.
- `githooks/pre-commit`: a **blocking** secret scanner (gitleaks-class) on staged memory/skill/journal files; `git config core.hooksPath githooks`. Install the scanner per the user's OS (ask; respect immutable-OS constraints if any).
- `CLAUDE.md` (global governance — see §6).
- `sync.sh` (symlink repo → `~/.claude/`, idempotent, prunes orphaned links, dry-run flag) + matching `uninstall.sh`.
- A **manifest generator** (small script) that writes `manifest.md` from disk so counts never drift; wire it into `sync.sh`.
### Step 2 — The agent roster (generic archetypes → instantiate per the user's domains)
Create only the agents the interview justified. For each agent file (`agents/<name>.md`) set frontmatter: `name`, `description` (when to invoke — be specific, this drives auto-delegation), `tools` (scope tightly), `model` (per §5). Seed each with a starter memory note.
Recommended archetypes (rename/drop to fit):
| Archetype | Mode | Purpose |
|---|---|---|
| **infra/ops expert** | RO→RW (user's choice) | the user's infra platform; conventions, deploy patterns |
| **dev expert(s)** | RW | per primary language/framework; build/test conventions |
| **UI/UX designer** | RW | interfaces; reads existing design system before inventing |
| **security-auditor** | read-only | OWASP-style review, secrets, misconfig; never edits |
| **devil's-advocate** | read-only | stress-tests plans; no false critique |
| **product/idea partner** | read-only | clarifies value, sequences by impact, fights scope creep |
| **remediation-planner** | read-only | turns audit findings into a sequenced, validated fix plan |
| **engineering-manager** | read-only-proposing | runs the self-improving review (§7); proposes, never applies |
Rules: read-only agents get `Read, Grep, Glob, Bash`(non-mutating). RW agents get edit/write but inherit the governance rules. The auditor, critic, planner, product, and engineering-manager are **always read-only**.
### Step 3 — Skills & commands
- **Skills** (`skills/<name>/SKILL.md`): auto-trigger on context (e.g. "editing a Dockerfile" → compose skill). Thin wrappers that load domain conventions.
- **Commands** (`commands/<name>.md`): explicit workflows. Useful starters: `/audit-all` (fan-out review → aggregate), a remediation flow (audit → human → separate applier), `/note` and `/eng-review` (§7).
### Step 4 — Shared knowledge
- `shared-knowledge/profile.md` (the interview output) and any cross-agent facts (topology, domains, conventions). **Prefer generated over hand-maintained**; if hand-maintained, stamp `last_verified` and have an agent reconcile it on demand.
### Step 5 — Memory seeding
- For each agent: `agent-memory/<name>/MEMORY.md` = a flat index of one-line pointers to detail notes. Seed 24 starter notes from the interview (conventions, guardrails, known gotchas). Keep a per-agent size budget.
### Step 6 — Optional grounding (only if the user opted in)
- MCP servers for the chosen surfaces (GitHub, browser/Playwright for UI, metrics/SSH for infra). Scope credentials tightly. Repo-only is a fine starting posture.
---
## 5. Model tiering (cost vs. capability)
Match model strength to the *judgment* each step needs, not to importance:
| Work | Model tier | Why |
|---|---|---|
| Counting/format/file ops, nudges | **none** (scripts) | zero cost; can't be "dumb" |
| High-frequency mechanical capture | **cheapest model** (Haiku-class) | short, structured, low-stakes |
| Routine coding/analysis | **mid model** (Sonnet-class) | the workhorse |
| High-judgment review, security, architecture, **editing instructions** | **strongest model** (Opus-class), used sparingly | mistakes here compound |
Make the **frequent** things cheap or model-free, and reserve the strongest model for **infrequent, high-stakes judgment** (the periodic review, security audits, design). Set `model:` in each agent's frontmatter accordingly.
---
## 6. Governance (`CLAUDE.md` — always loaded, non-negotiable)
Draft this WITH the user; it must include at minimum:
- **No silent behavioral change.** A fix may only fix the reported problem. Anything that changes observable behavior, disables a feature, swaps a design choice, or changes defaults is OFF-LIMITS without explicit prior approval — even if "obviously better." Surface options, wait for approval, implement only what was approved.
- **Secret hygiene.** Never read files likely to contain secrets (`.env`, `*secret*`, `*token*`, credentials, vaults). If a secret is seen anyway (by you or a subagent), **immediately and prominently** tell the user, name what leaked, and prompt rotation. Pass this rule into every subagent brief.
- **Human-approval gates.** Destructive, outward-facing, or hard-to-reverse actions require confirmation. Approval in one context doesn't extend to the next.
- **Honesty.** Report outcomes faithfully — failed tests, skipped steps, uncertainty — without hedging or false confidence.
- The user's own hard "never" rules from interview Q10.
---
## 7. The self-improving loop (first-class, but done right)
A human-triggered loop that makes agents compound capability over time. **No daemon. No auto-apply. No self-grading landfill.**
### Components
1. **`/note` command** — the user fires it the moment an agent gets something wrong: appends one ground-truth line `{agent, what was wrong, the fix}` to a **local-only** journal (`agent-memory/_journal/YYYY-MM.md`). Human-authored = highest signal, zero per-run cost.
2. **Friction-only auto-capture** (add *after* the review loop is proven to run): a directive that makes each subagent emit a `## Mini-Retro` block **only when there was friction** (a fact it had to discover that should've been in memory, a stale note, a missing skill, a human correction) — silence on clean success. A `SubagentStop` hook parses + appends it. Rides on the subagent's own model → near-zero added cost.
3. **SessionStart nudge** — a **fail-open** script (no model) that prints one line: `N unreviewed retros since <date> — run /eng-review`, escalating by count/age. This is the forcing function that fights dormancy without a daemon.
4. **`engineering-manager` agent + `/eng-review`** — periodic, human-run. Reads the journal, clusters by agent + recurring theme, and **proposes coaching as literal diffs** (new/updated skills, memory edits, instruction tweaks, **deletions of stale notes**). It is **read-only-proposing**. The user approves specific diffs; a **separate** step applies + commits with a note on what behavior changed.
5. **Skill distillation** — recurring *trusted* patterns become a proposed `SKILL.md`, ratified at review, never auto-applied.
### Mandatory security controls (the loop is a self-modifying-instructions pipeline — these are not optional)
- **No auto-apply.** The user reviews the **literal diff**, never a summary, before anything is written to `agents/`, `skills/`, `agent-memory/`, `shared-knowledge/`.
- **Reviewer ≠ applier.** The engineering-manager has no write/commit tools.
- **Provenance tagging.** Every retro records whether the content the agent saw was *trusted* (the user's own repos/config) or *untrusted* (third-party code, audited targets, remote content). Proposals derived from untrusted content are **quarantined** — shown as flagged quotes, never auto-distilled into instructions. (A malicious README an agent audited must not be able to write itself into an agent's permanent instructions.)
- **Observations ≠ instructions.** The journal stores escaped, quoted *observations*, never imperatives a future agent will obey. Neutralize directive-shaped text captured from targets.
- **Redaction + the secret-scan pre-commit gate** apply to everything the loop writes. Journal stays **local-only** unless the user explicitly opts into committing it (then only behind the scan gate).
- **Hooks never interpolate model output into a shell command.** Pass payloads via stdin/temp-file to a non-shell handler; schema-validate, size-cap, fence-escape. Hooks must **fail open** (never block a session) — they're global blast radius.
- **Governance surface is privileged.** Proposed edits to the engineering-manager's own prompt, the auditor's rules, the secret policy, or the redaction/gate logic are a separate highest-attention review category — never silently applied or pruned.
### Phased rollout (do NOT build it all at once)
- **Phase 1:** `/note` + SessionStart nudge + engineering-manager + `/eng-review`. Prove the user actually runs the review.
- **Phase 2:** add friction-only `SubagentStop` auto-capture *after* `/eng-review` has run ≥2×. (Prove you'll review before you pay to capture.)
- **Phase 3:** skill distillation + a crude "did the coaching help?" check (did the friction recur?) + per-agent memory budgets.
### Must-verify before relying on hooks (don't assume — check current behavior)
- Does `SubagentStop` fire for *all* subagent kinds the user spawns (including any orchestrated/workflow agents)?
- Does it fire reliably on long sessions?
- Are concurrent hook appends serialized, or do parallel agents race on the journal? (Use a lock or per-agent temp files.)
- Does SessionStart fire reliably after resume/clear?
- Confirm a hung/erroring hook cannot block session start in *any* project.
---
## 8. Sequencing summary (what to actually do, in order)
1. **Interview** (§3) → confirm captured profile.
2. **Foundation** (§4.1): repo + remote + secret-scan + CLAUDE.md + sync.sh + manifest. Commit & push.
3. **23 highest-priority agents** (not all at once) + their memory seeds. Use them for real for a bit.
4. **Skills/commands** for those domains; `/audit-all` + remediation flow if relevant.
5. **Self-improving loop Phase 1** (§7) once there are agents worth coaching.
6. **Grounding (MCP)** only if/when repo-only proves limiting.
7. **Expand** domains and the loop's later phases as the system earns it.
> Build the smallest thing that's genuinely useful, use it, and let real friction tell you what to add next. A lean environment that's actually used beats a comprehensive one that rots.
---
## 9. Health checklist (keep it from rotting)
- [ ] Repo committed + pushed; no long-lived uncommitted source-of-truth.
- [ ] Manifest matches reality (generated, not hand-typed).
- [ ] Every agent is reachable; no dead config.
- [ ] Hand-maintained docs carry `last_verified` and get reconciled.
- [ ] The review loop is actually being run (the nudge isn't being ignored).
- [ ] Memory isn't bloating boot context; stale notes get pruned at review.
- [ ] Secret-scan gate is active on every commit path.