Day 2 – The Method
How Dashi sets up Claude Code folders, what files he creates, and how his agent works
Copy this page and send it to your Claude Code or ChatGPT – they will help you complete this day
Timecodes
CLAUDE.md – the foundation of your agent
Without these files the agent does not know who you are, what language to speak, or what rules to follow. This is the first thing it reads at the start of EVERY session. A properly filled CLAUDE.md = the agent works exactly how you need it to, from the very first second.
- Memory lives in files, not in chat. Chat is wiped on /compact or new session – CLAUDE.md stays and reloads automatically.
- Global – foundation, project – specialization. Two levels separate universal rules from the specific agent's role. On your VPS there are two agents (Jarvis chat + Richard server-doctor), and the global CLAUDE.md works for both.
- Without CLAUDE.md the agent is a blank slate. No owner name, no language, no timezone, no limits. Every session starts from zero.
Copy the prompt in full, send it to Jarvis on Telegram. The agent will run all checks, compare your setup with the reference template from the installer, and produce a PASS/FAIL report per item. No manual bash commands – the agent does it all.
Run a self-diagnostic of the CLAUDE.md memory architecture. Execute all 5 checks, attach sources, produce a final PASS/FAIL table. Sources of truth (download and compare with what I have on VPS): - CLAUDE.md (global + agent): https://github.com/qwwiwi/edgelab-claude-md - Architecture (folders, hooks, settings.json): https://github.com/qwwiwi/public-architecture-claude-code - Gateway (Jarvis Telegram bot): https://github.com/qwwiwi/jarvis-telegram-gateway 1. Global CLAUDE.md Run: ls -la ~/.claude/CLAUDE.md If the file exists – read it with Read. Extract: owner name, language, timezone, line count. Download reference: https://raw.githubusercontent.com/qwwiwi/edgelab-claude-md/main/GLOBAL.md Compare by sections: Identity, Owner, Core principles, Security, Language policy. Placeholders in the reference must be filled in my file. If the file does not exist – record FAIL. 2. Project CLAUDE.md (Jarvis) Run: ls -la ~/.claude-lab/jarvis/.claude/CLAUDE.md If the file exists – read it, summarize the agent's role in one sentence. Download reference: https://raw.githubusercontent.com/qwwiwi/edgelab-claude-md/main/WORKSPACE.md Compare structure and key sections. 3. Load cascade List ALL CLAUDE.md files you currently see in the context of this session, with full paths. Must be at least 2: global + project. If you see only one – the cascade is broken. 4. Memory architecture Download: https://github.com/qwwiwi/public-architecture-claude-code Compare on my VPS: - ~/.claude-lab/jarvis/.claude/settings.json – hooks, permissions - Skills in ~/.claude-lab/jarvis/.claude/skills/ - Folder structure per FILES-REFERENCE.md and HOOKS.md from the repo 5. Jarvis gateway Download: https://github.com/qwwiwi/jarvis-telegram-gateway Compare with ~/claude-gateway/: - config.json – format, required fields - gateway.py – version - systemd: systemctl status claude-gateway At the end produce: a) table: | Check | Status | What I found | b) if anything is FAIL – propose a concrete fix (one command or one edit) c) list of sources you used for the template and architecture rules. Reference sources: - qwwiwi/edgelab-claude-md – reference CLAUDE.md templates (GLOBAL.md + WORKSPACE.md) - qwwiwi/public-architecture-claude-code – memory architecture reference, hooks, skills - qwwiwi/jarvis-telegram-gateway – gateway reference (config.json, gateway.py, systemd) - docs.claude.com/en/docs/claude-code/memory – official Anthropic docs
- github.com/qwwiwi/edgelab-claude-md – reference CLAUDE.md templates: GLOBAL.md (global) and WORKSPACE.md (project). Source of truth for agent identity setup.
- github.com/qwwiwi/public-architecture-claude-code – reference architecture for Claude Code memory: folders, hooks, load cascade, settings.json.
- github.com/qwwiwi/jarvis-telegram-gateway – Jarvis gateway reference: config.json, gateway.py, systemd unit. For verifying the bot is configured correctly.
- github.com/qwwiwi/edgelab-install – public EdgeLab installer, install.sh v3.0.3. Generates both CLAUDE.md files during VPS setup.
- docs.claude.com/claude-code/memory – official Anthropic documentation on Claude Code memory system.
Theory – how AI-agent memory works
Full breakdown: 4 memory layers, hooks, token optimization, self-learning. What separates a bare Claude from an agent with memory.
- 01Why you need a memory system
- 02Folder structure
- 034 memory layers
- 04What loads at session start
- 05HOT memory: conversation log
- 06Cron scripts: automatic rotation
- 07Hooks: automation without the agent
- 08Token optimization
- 09Model strategy
- 10Semantic search (L4)
- 11Learnings v2: self-learning
- 12Three loading scenarios
- 13Repositories and tests
- 14Self-audit prompt
01Why you need a memory system
The default Claude Code context window is 1,000,000 tokens (1M). But there is a problem:
- Agent quality degrades after ~50% fill
- At 800K+ the agent starts to ignore instructions
- Without memory management, the HOT file grows to 80KB+ per day
We set CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 – the only setting you need to change. Auto-compact triggers at 400K instead of the default 800K. Recommendation from Boris Cherny (Claude Code lead at Anthropic). All numbers below are relative to the 400K working window, not 1M.
Without a memory system, your agent is a goldfish with amnesia. Every new session is like the first day on the job.
02Folder structure
Each folder in .claude/ handles its own layer:
.claude/ ├── CLAUDE.md # SOUL – persona, role, style, boundaries ├── settings.json # Settings: env, hooks, permissions ├── core/ │ ├── USER.md # Owner profile │ ├── rules.md # Operational rules and boundaries │ ├── AGENTS.md # Agent directory (on-demand) │ ├── MEMORY.md # Cold memory, lessons index │ ├── LEARNINGS.md # Archive of lessons from mistakes │ ├── warm/ │ │ └── decisions.md # Key decisions (14 days) │ └── hot/ │ ├── handoff.md # Last 10 entries (in context) │ ├── recent.md # Full log (NOT loaded into session) │ └── archive/ # Old logs by date ├── tools/ │ └── TOOLS.md # Tools reference (on-demand) ├── skills/ # Skills – reusable abilities ├── hooks/ # Shell scripts for automation └── scripts/ # Cron scripts for memory rotation
Only 4 files enter context at startup via @include: USER.md, rules.md, decisions.md, handoff.md. Everything else (AGENTS.md, TOOLS.md, MEMORY.md, skills) – the agent reads on request via the Read tool. Saves ~18K tokens per session.
034 memory layers
Each layer loads differently:
- IDENTITY + WARM + HOT – automatically at startup via
@include - COLD – agent reads via Read tool when old context is needed
- L4 – agent queries via curl when information is older than 24 hours
04What loads at session start
| File | Responsibility | Tokens |
|---|---|---|
| ~/.claude/CLAUDE.md | Global rules for all projects | ~3 200 |
| ~/.claude/rules/*.md | Per-language rules (python, typescript, bash) | ~430 |
| CLAUDE.md (SOUL) | Agent persona, style, role, boundaries | ~3 500 |
| core/USER.md | Owner profile: channels, product, style | ~765 |
| core/rules.md | Operational rules: security, boundaries | ~1 935 |
| core/warm/decisions.md | Key decisions from the last 14 days | ~1 400 |
| core/hot/handoff.md | Last 10 entries from the log | 450–1 800 |
| TOTAL | ~11–13K |
11–13K tokens out of a 400K working window is ~3% of context. The other 97% is for work. The full log (recent.md) is NOT loaded into the session – only handoff.md (last 10 entries). AGENTS.md and TOOLS.md also do NOT load at startup – the agent reads them via Read tool on demand. Saves ~18K tokens.
05HOT memory: conversation log
The gateway automatically writes every interaction to recent.md:
### 2026-04-14 15:03 [own_voice] Prince: (voice transcription, 200 chars) Agent: (compressed reply, 200 chars)
Source tags:
| Tag | Source |
|---|---|
| own_text | Text message |
| own_voice | Voice (transcription via Groq Whisper) |
| forwarded | Forwarded message |
| external_media | External media |
recent.md is the full log. It grows to 80KB+ per day without compression. Only handoff.md enters the session context – last 10 entries (~1–4 KB). The Stop hook extracts them from recent.md when the session ends. The next session receives only the compressed handoff, not the full log.
06Cron scripts: automatic rotation
| Time | Script | What it does |
|---|---|---|
| 04:30 | rotate-warm.sh | WARM older than 14 days → COLD |
| 05:00 | trim-hot.sh | HOT older than 24h → Sonnet compresses → WARM |
| 06:00 | compress-warm.sh | WARM over 10KB → Sonnet re-compresses |
| 06:30 | ov-session-sync.sh | HOT + WARM → OpenViking (semantic) |
| 21:00 | memory-rotate.sh | COLD over 5KB → archive/YYYY-MM.md |
First WARM rotation, then HOT compression, then WARM re-compression, then sync to OpenViking. Sonnet is used for compression (not Opus) – 4x cheaper at the same summarization quality.
07Hooks: automation without the agent
Hooks are shell commands tied to Claude Code lifecycle events. CLAUDE.md is a recommendation (~80% compliance). Hooks are enforcement (100%).
Base hooks (any agent):
- block-dangerous.sh (PreToolUse → Bash) – blocks rm -rf, git push --force, DROP TABLE, curl | bash. Exit 2 = operation canceled.
- protect-files.sh (PreToolUse → Edit|Write) – protects .env, .pem, .key, secrets/*, package-lock.json from accidental changes.
- log-commands.sh (PostToolUse → Bash) – logs every command to command-log.txt. Audit.
Advanced hooks (multi-agent system):
- session-bootstrap.sh (SessionStart) – loads top-5 lessons from episodes.jsonl, checks inbox, sets heartbeat online.
- auto-recall.mjs (UserPromptSubmit) – sends the prompt to OpenViking, returns relevant memories.
- correction-detector.sh (UserPromptSubmit) – catches phrases like "don't", "wrong" – and triggers a lesson capture.
- review-reminder.sh (PostToolUse) – after 10+ edits reminds to run code review before commit.
- flush-to-openviking.sh (PreCompact) – before compaction saves HOT+WARM to OpenViking. Nothing is lost.
- write-handoff.sh (Stop) – generates handoff.md. Next session starts where the previous ended.
Configuration in settings.json:
{
"hooks": {
"PreToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": ".claude/hooks/block-dangerous.sh"
}]
}]
}
}
08Token optimization
You only need one env variable in settings.json:
{
"env": {
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "400000"
}
}
No other env variables need changing. Do not set MAX_THINKING_TOKENS, SUBAGENT_MODEL, AUTOCOMPACT_PCT – leave defaults. Models are managed by strategy, not by env.
Terse Mode: save on output. Output tokens cost 5x input (Opus: $15 vs $75 per 1M tokens). In a 400K working window every token matters. Add to rules.md:
## Output style Drop: articles (a/an/the), filler, pleasantries, hedging. Fragments OK. Short synonyms. Pattern: [thing] [action] [reason]. [next step].
Before: "Sure! I'd be happy to help you with that. The issue is likely caused by a problem in the auth middleware."
After: "Bug in auth middleware. Token expiry check uses < not <=. Fix:"
Savings: ~75%. The agent writes full code and exact errors – only prose gets compressed.
09Model strategy
Opus for code and decisions, Sonnet for subagents. No half-measures – code quality needs the best model. Subagents handle volume.
| Model | Role | Used for |
|---|---|---|
| Opus 4.7 | Primary | Code, review, planning, coordination |
| Sonnet 4.6 | Subagents | Research, search, analysis, memory compression |
| Codex GPT-5.5 | Optional | Double review (second opinion alongside Opus) |
| Sonar | Optional | Web research, fact-checking |
| Model | Input | Output | Relative |
|---|---|---|---|
| Sonnet 4.6 | $3/M | $15/M | 1x (baseline) |
| Opus 4.7 | $15/M | $75/M | ~5x (pays for itself in quality) |
On the Max plan ($100–200/mo) all models are included. Cost = rate-limit spend, not dollars. Sonnet for subagents = faster replies + less context.
Opus via OpenRouter – NEVER. Only the native Anthropic API or an Anthropic Max subscription.
10Semantic search (L4)
L4 is a local semantic database for long-term memory. The agent searches by meaning, not keywords.
How it works:
- Stop hook or cron (06:30 UTC) uploads HOT + WARM to OpenViking
- OpenViking creates embeddings automatically
- On the next session auto-recall.mjs fetches relevant information
# Search
curl -X POST "http://localhost:1933/api/v1/search/find" \
-H "X-API-Key: $KEY" \
-d '{"query": "what did we decide about the API", "limit": 10}'
Each agent writes to its own namespace but searches across all. Cross-agent search out of the box.
11Learnings v2: self-learning
The agent records lessons from mistakes. Not just "remember this", but systemically changing itself so the mistake does not repeat.
How detection works. The correction-detector.sh hook (UserPromptSubmit) scans every user message for trigger words:
| Category | Words |
|---|---|
| Direct corrections | "don't", "wrong", "not like that", "stop" |
| Accusatory questions | "why did you", "you forgot", "you again" |
| Broken state | "broke", "broken", "not working" |
| Repeat instructions | "I already said", "how many times" |
On match the hook injects a reminder: "CORRECTION DETECTED. Record a learning via learnings-engine.mjs capture".
Pipeline: from mistake to systemic change:
Correction from user → correction-detector.sh (catches trigger) → learnings-engine.mjs capture (records to episodes.jsonl) → learnings-engine.mjs score (computes rating) → learnings-engine.mjs lint (finds HOT/STALE/PROMOTE) → learnings-engine.mjs promote (changes the system)
Record format (episodes.jsonl):
{
"id": "EP-20260414-001",
"ts": "2026-04-14T15:03:00Z",
"type": "correction",
"source": "prince",
"context": "what happened",
"error": "what was wrong",
"rule": "rule for the future",
"impact": "high",
"tags": ["workflow", "git"],
"freq": 1,
"status": "active"
}
Scoring. Each episode gets a composite score (0–1):
| Factor | Weight | How it's computed |
|---|---|---|
| Recency | 40% | Linear decay over 30 days |
| Frequency | 30% | Repeat count (cap: 3) |
| Impact | 30% | critical=1.0, high=0.7, medium=0.4, low=0.1 |
- Score > 0.8 or freq ≥ 3 → PROMOTE (system change)
- Score < 0.15 → STALE (archive, lesson is outdated)
- freq ≥ 3 → HOT (rule not working, change the system)
Reliability pyramid:
The more critical the mistake, the higher up the pyramid it gets promoted. Production bug → straight to a hook/script.
Where it promotes by tags:
| Episode tags | Target file | Owner OK? |
|---|---|---|
| stack, models, tools | TOOLS.md | no |
| workflow, communication | CLAUDE.md / SKILL.md | no |
| security, git | rules.md | yes |
| config, scp | rules.md | yes |
Green zone (TOOLS.md, SKILL.md) – agent edits autonomously. Red zone (rules.md, CLAUDE.md) – only with owner's approval.
12Three loading scenarios
| Scenario | Tokens | % of 400K |
|---|---|---|
| After cron | ~27K | ~7% |
| End of day (before cron) | ~60K | ~15% |
| Cron broken for a week | ~114K | ~29% |
With a 400K working window: after cron memory takes 7%, without cron – 15%. If cron is broken for a week – 29%, already noticeable. Compression exists to preserve context cleanliness and agent quality.
13Repositories and tests
1. public-architecture-claude-code – architecture, templates, scripts, install.sh. github.com/qwwiwi/public-architecture-claude-code
2. jarvis-telegram-gateway – Gateway: Telegram → Claude Code. github.com/qwwiwi/jarvis-telegram-gateway
3. architecture-brain-tests – 800 tests verifying everything described above. github.com/qwwiwi/architecture-brain-tests
Test categories:
- T20: security (no secrets in templates, .gitignore)
- T26: models (correct IDs, bans, double review)
- T27: COMPACT_WINDOW (400K, 1M context)
- T28: Learnings v2 (engine, scoring, pyramid)
- And 25+ more categories
14Self-audit prompt
Copy and send to your agent. It will check itself against all rules:
Memory:
- Is there a CLAUDE.md? How many lines? (target: up to 200)
- Is there @include for USER.md, rules.md, decisions.md?
- Is there core/hot/recent.md? What size?
- Is there core/warm/decisions.md?
- Are cron scripts configured for memory rotation?
Settings:
- CLAUDE_CODE_AUTO_COMPACT_WINDOW set? (target: 400000)
Models:
- Is Opus used for code and review?
- Is Sonnet used for subagents?
- Is double review configured (Opus + Codex GPT-5.5)?
Hooks:
- Is there a PreToolUse hook blocking dangerous commands?
- Is there a PostToolUse hook for logging?
- Is there a Stop hook writing handoff?
Security:
- Are .env, .key, .pem files in .gitignore?
- Are secrets stored in secrets/ or shared/secrets/?
- No hardcoded tokens/keys in files?
Token optimization:
- CLAUDE.md under 200 lines?
- AGENTS.md and TOOLS.md load on-demand, not via @include?
- Is /compact used between tasks?
Result format: Passed: X/18, Failed: [list with recommendations].
The architecture is open. install.sh sets everything up in 2 minutes. 800 tests verify nothing is broken. Take it, adapt it, improve it. Default window 1M, working window – 400K (CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000). The rest is memory architecture, hooks, and discipline.
By the end of the day: agent configured using the method – knows the participant's context and tasks.
Connect semantic memory – OpenViking + OpenAI
CLAUDE.md gives the agent its identity – name, role, rules. But facts from conversations (what you decided, who you mentioned, what number you cited) won't be written into CLAUDE.md by hand. You need long-term semantic memory: the agent stores facts itself and retrieves them by meaning, not by keyword.
OpenViking – open-source "context database for AI agents" by volcengine (22K ★ on GitHub, Apache 2.0). My own long-term memory runs on it. There's an official Claude Code plugin: tools memory_recall, memory_store, memory_forget, memory_health plus hooks (auto-recall on every prompt, auto-capture on session stop).
OpenViking is free and open-source, but to extract meaning it calls OpenAI models (you can run Ollama locally instead, but for the workshop OpenAI is simpler and faster). You need an OpenAI API key and $5 on the balance – that's enough for months at our load.
- API key – your personal password to OpenAI. Created in the dashboard in one click, starts with
sk-.... Pay-as-you-go: cents for embeddings, dollars for LLM. $5 on balance is a buffer for months. - Embeddings – a model turns text into a vector (1536 numbers). "Owner – Dashi" becomes a point in a 1536-dimensional space. A month later you ask "who's my boss?" – OpenViking computes the query vector and searches for nearby points. That's search by meaning, not by keyword. Model:
text-embedding-3-small– $0.02 per 1M tokens, basically free. - VLM / LLM – reads large texts (100 KB article, an hour of dialogue) and compresses them into two levels: L0 (~100 tokens – abstract in one line), L1 (~2000 tokens – detailed retelling). The agent reads L1/L0 in 95% of cases, not the original – saves context. Model:
gpt-4o-mini– ~10x cheaper than the defaultgpt-5.5from docs, more than enough for the workshop.
- Open platform.openai.com/api-keys, log in or sign up.
- Click "Create new secret key" → name "
openviking-edgelab" → "Create" → copy the key (starts withsk-..., shown ONCE – don't close the window until you've saved it). - Open billing → "Add payment method" → top up with $5.
- The agent will ask for the key in chat – you just paste
sk-...as the next message. The agent writes it into~/.openviking/ov.confon your VPS with 600 permissions (only you can read it).
ov.conf template – for referenceThe agent will create this file and substitute the key itself. Shown here so you see exactly what gets written. "Copy" button is in the top-right – use if you want to build the config by hand.
{
"embedding": {
"dense": {
"api_base": "https://api.openai.com/v1",
"api_key": "sk-PASTE_YOUR_KEY",
"provider": "openai",
"dimension": 1536,
"model": "text-embedding-3-small"
}
},
"vlm": {
"api_base": "https://api.openai.com/v1",
"api_key": "sk-PASTE_YOUR_KEY",
"provider": "openai",
"model": "gpt-4o-mini"
}
}
The agent runs all commands itself: installs Docker, creates ov.conf, spins up OpenViking, connects it to Claude Code via the official plugin, runs a smoke test. You only paste your sk-... key when the agent asks. "Copy" button is in the top-right of the block.
Install OpenViking as my long-term semantic memory with the OpenAI provider. Follow the steps strictly, show the output of each command.
0. ASK ME for the OpenAI API key (starts with sk-...). I will paste it in the next message. Remind me that a minimum $5 balance is required at platform.openai.com/settings/organization/billing.
SECURITY: my key is a secret. Never show it in responses, logs, or commits. After I give it to you, do not quote it. Make sure ~/.openviking/ is not tracked by Git.
1. Check Docker. Run: docker --version && docker compose version
If Docker is not installed, install it: sudo apt-get update && sudo apt-get install -y docker.io docker-compose-plugin
Then: sudo systemctl enable --now docker
If it fails – STOP, show the full error.
2. Create the working directory and download the official docker-compose.yml:
mkdir -p ~/.openviking/data && cd ~/.openviking
curl -fsSL https://raw.githubusercontent.com/volcengine/OpenViking/main/docker-compose.yml -o docker-compose.yml
3. Create the file ~/.openviking/ov.conf with the following JSON (substitute my API key in BOTH api_key fields):
{
"embedding": {
"dense": {
"api_base": "https://api.openai.com/v1",
"api_key": "MY_KEY",
"provider": "openai",
"dimension": 1536,
"model": "text-embedding-3-small"
}
},
"vlm": {
"api_base": "https://api.openai.com/v1",
"api_key": "MY_KEY",
"provider": "openai",
"model": "gpt-4o-mini"
}
}
Models chosen for budget: text-embedding-3-small ($0.02 per 1M tokens) + gpt-4o-mini (~10x cheaper than the default gpt-5.5 from docs). For the workshop, $5 will last a long time.
After creation, run: chmod 600 ~/.openviking/ov.conf (key is a secret, file is owner-only).
4. Start the server in the background: cd ~/.openviking && docker compose up -d
Wait until the container becomes healthy: docker compose ps (STATUS column should be Up / healthy).
5. Check the API is alive: curl -sS http://localhost:1933/health
Expect {"status":"ok"}. If not – docker compose logs, analyze the error, suggest a fix. If the OpenAI key is invalid – health will fail with an explicit auth error, then ask me to provide a new key.
6. When /health responds ok – output EXACTLY this block (I will copy it into Claude Code, these are slash-commands you cannot execute yourself):
/plugin marketplace add Castor6/openviking-plugins
/plugin install claude-code-memory-plugin@openviking-plugin
In one sentence explain: this is the official Claude Code plugin from OpenViking, it adds tools memory_recall / memory_store / memory_forget / memory_health plus hooks for auto-recall and auto-capture.
7. When I say the plugin is installed – run three checks:
a) call tool memory_health – if ok, memory is connected;
b) store the first entry via memory_store: "Owner {{my_name}}, timezone {{my_timezone}}, EdgeLab workshop day-2 completed {{today_date}}";
c) do memory_recall with query "owner" – should return what was just stored.
8. At the end produce a report:
– Docker: version, service status
– OpenViking: port, health, container uptime
– ov.conf: exists, chmod 600, embedding and vlm models
– OpenAI key: present in config (DO NOT show the key itself, only confirm presence)
– Plugin: installed / not installed, list of available tools
– First entry: stored, found via recall
If any step fails – STOP, full error output, do not guess, do not proceed to the next step.
- github.com/volcengine/OpenViking – official repository (volcengine, Apache 2.0, 22K ★). docker-compose.yml, docs, integration examples.
- openviking.ai – official project website.
-
docs/en/guides/01-configuration.md –
ov.confconfiguration: "OpenAI Models" section with all embedding and vlm fields. - docs/en/getting-started/03-quickstart-server.md – server quickstart: install, run, prepare config.
- examples/claude-code-memory-plugin – official Claude Code plugin (MCP tools memory_* + hooks auto-recall/capture).
- platform.openai.com/api-keys – create an OpenAI API key.
- platform.openai.com/docs/guides/embeddings – official OpenAI docs on embeddings (what they are, how they work, limits).
- qwwiwi/public-architecture-claude-code – my reference Claude Code architecture, where OpenViking is described as the L4 semantic layer.