feat: add field-guide landing page and agentic-environment guide

2026-06-04 16:31:18 +02:00
parent 0ab003ec84
commit d403aa0467
3 changed files with 2452 additions and 0 deletions
@@ -0,0 +1,227 @@
+# Agentic Environment — Bootstrap Instructions for Claude Code
+
+> **You are Claude Code. A user has handed you this document to set up a personal, multi-domain "agentic environment" from scratch.**
+> Your job is to (1) understand the architecture below, (2) **interview the user** to capture *their* domains, stack, conventions, and guardrails, then (3) **scaffold a clean, greenfield setup** for them — committing as you go.
+>
+> This is a *clean-room* design. Do **not** import any one person's accumulated tech debt. Build it right from day one: committed + backed up, self-describing, no dead config, security and governance baked in. Adapt every concrete detail to the user via the interview — the patterns are fixed, the specifics are theirs.
+
+---
+
+## 0. How to use this document
+
+**User:** place this file where your Claude Code can read it, then say: *"Read agentic-setup-bootstrap.md and set up my agentic environment."*
+
+**Claude Code:** Work through this in order — **Principles → Interview → Scaffold → Phased rollout**. Do not scaffold before interviewing. Show the user a plan and get approval before writing files. Commit after each coherent step. Never invent the user's conventions — ask.
+
+---
+
+## 1. What you're building (the mental model)
+
+A **source-of-truth git repo** (call it `agentic-dev/` or similar) that is **symlinked into `~/.claude/`**, containing:
+
+```
+agentic-dev/
+├── agents/<name>.md          # expert subagents (focused roles)
+├── skills/<name>/SKILL.md    # auto-triggering capabilities
+├── commands/<name>.md        # slash commands (explicit workflows)
+├── agent-memory/<name>/      # each agent's persistent memory (MEMORY.md index + notes)
+├── shared-knowledge/         # facts referenced by multiple agents
+├── hooks/                    # lifecycle scripts (session start, capture, etc.)
+├── githooks/pre-commit       # secret-scan gate
+├── sync.sh / uninstall.sh    # symlink installer/remover
+├── manifest.* (generated)    # the system's accurate self-description
+├── CLAUDE.md                 # global governance rules (always loaded)
+└── README.md
+```
+
+The repo is the truth; `~/.claude/` is a set of symlinks into it. Edit the repo → it's live. This makes the whole environment **versioned, reviewable, revertible, and portable across machines**.
+
+### The five pillars
+1. **Expert subagents** — narrow, role-scoped agents with their own memory and tool permissions, instead of one do-everything prompt.
+2. **Layered memory** — *semantic* (current facts), *episodic* (what happened / a decision journal), *procedural* (skills/checklists), with explicit time-awareness so it doesn't silently rot.
+3. **Governance** — hard rules that always load: no silent behavioral change, secret hygiene, human-approval gates for anything risky.
+4. **The self-improving loop** — agents capture friction → a periodic human-triggered review coaches them → they get sharper. Compounding capability, no daemon.
+5. **Grounding** — agents observe reality (repos, and optionally live systems via MCP) rather than guessing, with the blast radius the user is comfortable with.
+
+---
+
+## 2. Design principles (these are fixed — do not compromise them)
+
+- **Committed + backed up from commit #1.** `git init`, add a remote, push. Never let the source-of-truth live only as uncommitted local files.
+- **Self-describing, never lying.** Generate the manifest (counts/inventory of agents/skills/commands) from disk; never hand-maintain a number that will drift.
+- **No dead config.** Every agent in `agents/` must be a registered, reachable agent type. Every skill must trigger. If it isn't used, remove it — don't accrete.
+- **Read-only by default; write is a privilege.** Agents that *analyze* (auditors, critics, planners, the reviewer) get non-mutating tools only. Agents that *change things* get write tools and a tighter leash.
+- **Reviewer ≠ applier.** The agent that decides *what* to change never also *applies* it unattended. Propose → human approves the literal diff → a separate step applies + commits.
+- **Memory captures the non-derivable.** Don't store what's re-readable from code/docs. Store hard-won facts: quirks, gotchas, decisions, "we tried X, it failed because Y." Notes record their own deprecations.
+- **Episodic ≠ boot-loaded.** The decision/retro journal is read *on demand* (at review), never loaded into every agent's startup context — or it bloats and dilutes the high-signal rules.
+- **Human-triggered autonomy only** (per this user's class of setup): nudges and scheduled *reminders* are fine; unattended self-modifying action is not.
+- **Governance > convenience.** When a fix would also change behavior the user can observe, stop and surface options; never silently "improve."
+
+### Anti-patterns to actively avoid (the tech debt that kills these systems)
+- Source-of-truth repo with **no remote** and a pile of uncommitted changes → one bad `git` command from total loss.
+- **Hand-maintained snapshots** (topology docs, project lists, counts) that rot silently while reality moves on.
+- **Self-description that lies** ("7 agents" when there are 8) → mis-routes work, hides capabilities.
+- **Orphaned/dead agents** referenced everywhere but never actually invocable.
+- **Capture without review** → a journal that grows forever and is never read = "dormancy with a token tax."
+- **Self-graded retros** ("everything went great") = landfill. The gold is the *human's correction*.
+
+---
+
+## 3. The interview (run this WITH the user before building anything)
+
+Capture answers into `shared-knowledge/profile.md` as you go. Teach while you ask — the user may be newer than this document assumes.
+
+**A. Domains & priorities**
+1. Which domains do you want agents for? (e.g. homelab/infra ops, software development, UI/UX, product/idea development, research/consultation, writing, data.) Rank them.
+2. For each domain: what does a *good outcome* look like? What recurring task or pain motivated this?
+
+**B. Stack & conventions (per domain)**
+3. What's your actual stack in each domain? (OS, languages/frameworks, infra platform, container runtime, git host, CI, deployment style.) — *Do not assume; this is where their setup differs from any template.*
+4. Any house conventions you want enforced? (file layout, naming, commit format, code style, where things get deployed.)
+5. Where does your code/config live (paths, repos)? What should agents be allowed to read vs. never touch?
+
+**C. Risk & access posture**
+6. For infra/ops: do you want agents to **act** (make changes, run commands, SSH) or only **advise** (read + propose)? Start advisory unless the user is confident.
+7. Do you want any **live-system grounding** (MCP servers for metrics/SSH/browser/GitHub), or repo-only to start? Default repo-only; add grounding deliberately.
+8. Always-on / scheduled autonomy: usually **no** — confirm. Reminders and human-triggered reviews only?
+
+**D. Guardrails & secrets**
+9. Your secrets policy: which files/paths must never be read? How should a leaked secret be handled if one is seen?
+10. Any hard "never do this" rules? (e.g. never modify the host package manager, never force-push, never change behavior without asking.)
+
+**E. Cost/quality**
+11. What's your model budget posture? (Subscription vs API; tolerance for using the strongest model.) This sets the model tiering (§5).
+
+**F. Self-improving loop**
+12. Do you want the learning loop (capture friction → periodic review → coaching)? Explain the dormancy risk honestly. If yes, what review cadence feels realistic?
+
+Summarize the captured profile back to the user and confirm before scaffolding.
+
+---
+
+## 4. Scaffold procedure (build in this order; commit after each)
+
+### Step 1 — Foundation
+- `git init`; create the repo structure (§1).
+- Add a **remote** and push immediately. Confirm the remote is **private** if anything sensitive may land there.
+- `.gitignore`: exclude the episodic journal (`agent-memory/_journal/`), local overrides, OS cruft.
+- `githooks/pre-commit`: a **blocking** secret scanner (gitleaks-class) on staged memory/skill/journal files; `git config core.hooksPath githooks`. Install the scanner per the user's OS (ask; respect immutable-OS constraints if any).
+- `CLAUDE.md` (global governance — see §6).
+- `sync.sh` (symlink repo → `~/.claude/`, idempotent, prunes orphaned links, dry-run flag) + matching `uninstall.sh`.
+- A **manifest generator** (small script) that writes `manifest.md` from disk so counts never drift; wire it into `sync.sh`.
+
+### Step 2 — The agent roster (generic archetypes → instantiate per the user's domains)
+Create only the agents the interview justified. For each agent file (`agents/<name>.md`) set frontmatter: `name`, `description` (when to invoke — be specific, this drives auto-delegation), `tools` (scope tightly), `model` (per §5). Seed each with a starter memory note.
+
+Recommended archetypes (rename/drop to fit):
+| Archetype | Mode | Purpose |
+|---|---|---|
+| **infra/ops expert** | RO→RW (user's choice) | the user's infra platform; conventions, deploy patterns |
+| **dev expert(s)** | RW | per primary language/framework; build/test conventions |
+| **UI/UX designer** | RW | interfaces; reads existing design system before inventing |
+| **security-auditor** | read-only | OWASP-style review, secrets, misconfig; never edits |
+| **devil's-advocate** | read-only | stress-tests plans; no false critique |
+| **product/idea partner** | read-only | clarifies value, sequences by impact, fights scope creep |
+| **remediation-planner** | read-only | turns audit findings into a sequenced, validated fix plan |
+| **engineering-manager** | read-only-proposing | runs the self-improving review (§7); proposes, never applies |
+
+Rules: read-only agents get `Read, Grep, Glob, Bash`(non-mutating). RW agents get edit/write but inherit the governance rules. The auditor, critic, planner, product, and engineering-manager are **always read-only**.
+
+### Step 3 — Skills & commands
+- **Skills** (`skills/<name>/SKILL.md`): auto-trigger on context (e.g. "editing a Dockerfile" → compose skill). Thin wrappers that load domain conventions.
+- **Commands** (`commands/<name>.md`): explicit workflows. Useful starters: `/audit-all` (fan-out review → aggregate), a remediation flow (audit → human → separate applier), `/note` and `/eng-review` (§7).
+
+### Step 4 — Shared knowledge
+- `shared-knowledge/profile.md` (the interview output) and any cross-agent facts (topology, domains, conventions). **Prefer generated over hand-maintained**; if hand-maintained, stamp `last_verified` and have an agent reconcile it on demand.
+
+### Step 5 — Memory seeding
+- For each agent: `agent-memory/<name>/MEMORY.md` = a flat index of one-line pointers to detail notes. Seed 2–4 starter notes from the interview (conventions, guardrails, known gotchas). Keep a per-agent size budget.
+
+### Step 6 — Optional grounding (only if the user opted in)
+- MCP servers for the chosen surfaces (GitHub, browser/Playwright for UI, metrics/SSH for infra). Scope credentials tightly. Repo-only is a fine starting posture.
+
+---
+
+## 5. Model tiering (cost vs. capability)
+
+Match model strength to the *judgment* each step needs, not to importance:
+
+| Work | Model tier | Why |
+|---|---|---|
+| Counting/format/file ops, nudges | **none** (scripts) | zero cost; can't be "dumb" |
+| High-frequency mechanical capture | **cheapest model** (Haiku-class) | short, structured, low-stakes |
+| Routine coding/analysis | **mid model** (Sonnet-class) | the workhorse |
+| High-judgment review, security, architecture, **editing instructions** | **strongest model** (Opus-class), used sparingly | mistakes here compound |
+
+Make the **frequent** things cheap or model-free, and reserve the strongest model for **infrequent, high-stakes judgment** (the periodic review, security audits, design). Set `model:` in each agent's frontmatter accordingly.
+
+---
+
+## 6. Governance (`CLAUDE.md` — always loaded, non-negotiable)
+
+Draft this WITH the user; it must include at minimum:
+
+- **No silent behavioral change.** A fix may only fix the reported problem. Anything that changes observable behavior, disables a feature, swaps a design choice, or changes defaults is OFF-LIMITS without explicit prior approval — even if "obviously better." Surface options, wait for approval, implement only what was approved.
+- **Secret hygiene.** Never read files likely to contain secrets (`.env`, `*secret*`, `*token*`, credentials, vaults). If a secret is seen anyway (by you or a subagent), **immediately and prominently** tell the user, name what leaked, and prompt rotation. Pass this rule into every subagent brief.
+- **Human-approval gates.** Destructive, outward-facing, or hard-to-reverse actions require confirmation. Approval in one context doesn't extend to the next.
+- **Honesty.** Report outcomes faithfully — failed tests, skipped steps, uncertainty — without hedging or false confidence.
+- The user's own hard "never" rules from interview Q10.
+
+---
+
+## 7. The self-improving loop (first-class, but done right)
+
+A human-triggered loop that makes agents compound capability over time. **No daemon. No auto-apply. No self-grading landfill.**
+
+### Components
+1. **`/note` command** — the user fires it the moment an agent gets something wrong: appends one ground-truth line `{agent, what was wrong, the fix}` to a **local-only** journal (`agent-memory/_journal/YYYY-MM.md`). Human-authored = highest signal, zero per-run cost.
+2. **Friction-only auto-capture** (add *after* the review loop is proven to run): a directive that makes each subagent emit a `## Mini-Retro` block **only when there was friction** (a fact it had to discover that should've been in memory, a stale note, a missing skill, a human correction) — silence on clean success. A `SubagentStop` hook parses + appends it. Rides on the subagent's own model → near-zero added cost.
+3. **SessionStart nudge** — a **fail-open** script (no model) that prints one line: `N unreviewed retros since <date> — run /eng-review`, escalating by count/age. This is the forcing function that fights dormancy without a daemon.
+4. **`engineering-manager` agent + `/eng-review`** — periodic, human-run. Reads the journal, clusters by agent + recurring theme, and **proposes coaching as literal diffs** (new/updated skills, memory edits, instruction tweaks, **deletions of stale notes**). It is **read-only-proposing**. The user approves specific diffs; a **separate** step applies + commits with a note on what behavior changed.
+5. **Skill distillation** — recurring *trusted* patterns become a proposed `SKILL.md`, ratified at review, never auto-applied.
+
+### Mandatory security controls (the loop is a self-modifying-instructions pipeline — these are not optional)
+- **No auto-apply.** The user reviews the **literal diff**, never a summary, before anything is written to `agents/`, `skills/`, `agent-memory/`, `shared-knowledge/`.
+- **Reviewer ≠ applier.** The engineering-manager has no write/commit tools.
+- **Provenance tagging.** Every retro records whether the content the agent saw was *trusted* (the user's own repos/config) or *untrusted* (third-party code, audited targets, remote content). Proposals derived from untrusted content are **quarantined** — shown as flagged quotes, never auto-distilled into instructions. (A malicious README an agent audited must not be able to write itself into an agent's permanent instructions.)
+- **Observations ≠ instructions.** The journal stores escaped, quoted *observations*, never imperatives a future agent will obey. Neutralize directive-shaped text captured from targets.
+- **Redaction + the secret-scan pre-commit gate** apply to everything the loop writes. Journal stays **local-only** unless the user explicitly opts into committing it (then only behind the scan gate).
+- **Hooks never interpolate model output into a shell command.** Pass payloads via stdin/temp-file to a non-shell handler; schema-validate, size-cap, fence-escape. Hooks must **fail open** (never block a session) — they're global blast radius.
+- **Governance surface is privileged.** Proposed edits to the engineering-manager's own prompt, the auditor's rules, the secret policy, or the redaction/gate logic are a separate highest-attention review category — never silently applied or pruned.
+
+### Phased rollout (do NOT build it all at once)
+- **Phase 1:** `/note` + SessionStart nudge + engineering-manager + `/eng-review`. Prove the user actually runs the review.
+- **Phase 2:** add friction-only `SubagentStop` auto-capture *after* `/eng-review` has run ≥2×. (Prove you'll review before you pay to capture.)
+- **Phase 3:** skill distillation + a crude "did the coaching help?" check (did the friction recur?) + per-agent memory budgets.
+
+### Must-verify before relying on hooks (don't assume — check current behavior)
+- Does `SubagentStop` fire for *all* subagent kinds the user spawns (including any orchestrated/workflow agents)?
+- Does it fire reliably on long sessions?
+- Are concurrent hook appends serialized, or do parallel agents race on the journal? (Use a lock or per-agent temp files.)
+- Does SessionStart fire reliably after resume/clear?
+- Confirm a hung/erroring hook cannot block session start in *any* project.
+
+---
+
+## 8. Sequencing summary (what to actually do, in order)
+
+1. **Interview** (§3) → confirm captured profile.
+2. **Foundation** (§4.1): repo + remote + secret-scan + CLAUDE.md + sync.sh + manifest. Commit & push.
+3. **2–3 highest-priority agents** (not all at once) + their memory seeds. Use them for real for a bit.
+4. **Skills/commands** for those domains; `/audit-all` + remediation flow if relevant.
+5. **Self-improving loop Phase 1** (§7) once there are agents worth coaching.
+6. **Grounding (MCP)** only if/when repo-only proves limiting.
+7. **Expand** domains and the loop's later phases as the system earns it.
+
+> Build the smallest thing that's genuinely useful, use it, and let real friction tell you what to add next. A lean environment that's actually used beats a comprehensive one that rots.
+
+---
+
+## 9. Health checklist (keep it from rotting)
+- [ ] Repo committed + pushed; no long-lived uncommitted source-of-truth.
+- [ ] Manifest matches reality (generated, not hand-typed).
+- [ ] Every agent is reachable; no dead config.
+- [ ] Hand-maintained docs carry `last_verified` and get reconciled.
+- [ ] The review loop is actually being run (the nudge isn't being ignored).
+- [ ] Memory isn't bloating boot context; stale notes get pruned at review.
+- [ ] Secret-scan gate is active on every commit path.