Practical real-world guidance from first prompts through skills, servers and feedback loops
April 30th · 1:00pm – 2:00pm
The Digital Greenhouse
Locally educated software engineer
Coursera/Stanford ML course — Andrew Ng's classic. Learned the foundations.
Sentiment analysis on Azure — Used Azure Cognitive Services for blog comment moderation.
Port CMS via Cursor — Ported Intracia CMS from Nuxt/Vue to Next/React.
Evolving the CMS — Continued to develop and refine with Cursor — very helpful but clear limits. Used Vercel v0 to design and iterate on UX.
All-in on side projects — zx84, OpenCPM, EnvyPlayer, Ch8ter — all built or maintained with Claude Code.
Adopted at MongoDB — Using Claude Code professionally on the EF Core Provider. From side projects to production.
806 commits • 53k lines TypeScript • 10k lines Rust
45k lines TypeScript
22k lines TypeScript
20k lines TypeScript
5k assembly • 3k Python
A terminal process running in your project. Deterministic — no hallucination possible at this layer.
Three levels of granularity. The right one depends on what you're building and what you can meaningfully review.
Describe everything in one prompt.
When: throwaway scripts, quick explorations, prototypes you'll discard.
One feature per prompt. Review, refine, repeat.
When: real projects you intend to maintain and ship.
One small change per prompt. Highly controlled.
When: sensitive code, security-critical paths, or debugging specific issues.
Download from claude.ai — Mac & Windows. GUI interface, no terminal needed.
Official
Community
# Mac / Linux curl -fsSL https://claude.ai/install.sh | bash # Windows irm https://claude.ai/install.ps1 | iex
# 1. Create a project folder mkdir my-project && cd my-project # 2. Initialise git (Claude needs this) git init # 3. Fire up Claude in auto mode claude --permission-mode auto # 4. Start typing! > Build me a REST API in Node.js with Express that serves a list of Guernsey beaches with name, parish, and coordinates.
Auto mode lets Claude run safe actions — file edits, reads, builds — without asking permission every time. Risky actions still require approval.
You know what you want to build but not which stack, which database, or where to deploy it.
> I want to build a task management API with auth. It needs to be type-safe, cheap to host, and easy to deploy. What are my options?
Here are three approaches depending on your priorities: Cheapest: Hono + SQLite (Turso) on Cloudflare Workers — near-zero cost, edge-deployed, but limited to HTTP workers Most flexible: Express + Drizzle + Postgres on Fly.io — real server, full control, free tier generous Fastest to ship: Next.js API routes + Prisma + Supabase — managed Postgres, built-in auth, one deploy target All three give you end-to-end TypeScript. What matters most — cost, flexibility, or speed?
> Flexibility. Go with Fly.io. Set it all up.
Created package.json, tsconfig.json, drizzle.config.ts Created src/index.ts, src/db/schema.ts, src/routes/tasks.ts, src/middleware/auth.ts Created tests/tasks.test.ts, docker-compose.yml, fly.toml Run pnpm install && pnpm test — 4 passed
11 prompts, ~15 minutes:
You need a rate limiter. You know the requirements. You just don't want to spend the afternoon writing it.
> Add rate limiting middleware to the Express API. Use a sliding window algorithm, 100 req/min per API key, stored in Redis. Return 429 with retry-after header when exceeded.
Created src/middleware/rateLimiter.ts Sliding window with Redis MULTI/EXEC, configurable limits per route group Updated src/app.ts Registered middleware before routes Created tests/rateLimiter.test.ts 12 tests: normal flow, exceeded, reset, multi-key, Redis failure Run npm test 12 passed
The new auth module needs a thorough review. The more eyes the better — and different models find different things. Run the same review through Claude, Copilot, and GPT. Their findings rarely overlap completely.
> Review src/auth/ for security issues. Check for injection, broken access control, token handling, and timing attacks. Be thorough and harsh.
CRITICAL tokenService.ts:47 JWT secret loaded from env without fallback check — starts with empty string in dev, tokens are unsigned WARNING login.ts:23 Password comparison uses === not crypto.timingSafeEqual — vulnerable to timing attacks WARNING session.ts:91 Session ID generated with Math.random not crypto.randomUUID
You're designing a notification system and want to think through the tradeoffs before committing.
> I need to add real-time notifications. Considering WebSockets vs SSE vs polling. We have ~5K concurrent users, Node backend, Redis already in the stack. What would you recommend and why?
For your setup I'd recommend SSE: Why SSE over WebSockets: You're doing server→client only. SSE works through proxies/CDNs. Built-in reconnection & event IDs. Half the complexity of WS. Architecture: Redis Pub/Sub for cross-instance fan-out. SSE endpoint per user session. Fallback to polling for legacy clients. When to reconsider: If you later need client→server (chat, collaborative editing) then upgrade to WebSockets.
You've inherited a billing module with no docs. 4,000 lines across 12 files. You need to understand it before making changes.
> Read everything in src/billing/ and explain how the invoicing pipeline works end-to-end. Include the data flow, key decision points, and anything that looks like a bug.
Invoice pipeline: 1. Cron triggers generateInvoices() at midnight UTC 2. Queries unbilled usage from usage_events table 3. Groups by customer + plan tier, applies discounts from coupons 4. Writes to invoices table, queues email via notifyQueue Possible bug line 847 Discount calc uses Math.floor before currency rounding — customers losing fractional cents on every invoice
A single file has grown out of control and nobody wants to touch it.
Tip: Smaller, well-structured code also means more re-use and fewer tokens spent on similar features. Schedule refactoring reviews regularly — your AI budget will thank you.
> Invoicing.cs is too big at 1,400 lines. How can we break it down? Just propose a plan.
I've analysed Invoicing.cs and identified 5 logical groupings. Here's how I'd split it: InvoiceGenerator.cs ~320 lines Core pipeline: usage aggregation, line item creation, tax calculation InvoiceValidator.cs ~180 lines Validation rules, duplicate detection, date range checks DiscountEngine.cs ~240 lines Coupon application, tiered pricing, promotional rules InvoiceNotifier.cs ~160 lines Email queueing, PDF generation, webhook dispatch InvoiceRepository.cs ~280 lines All database access — queries, inserts, status updates InvoicingHelpers.cs ~120 lines Currency rounding, date utilities, formatting The remaining ~100 lines are using statements and shared constants — those move to a shared file. Want me to proceed with this breakdown?
You've shipped a feature but test coverage is thin. Or you're about to refactor and want a safety net first.
> Read src/billing/DiscountEngine.cs and write comprehensive tests. Cover happy paths, edge cases, and anything that looks risky. Use the existing test conventions in tests/.
Created tests/DiscountEngineTests.cs — 34 tests Covered: Percentage & fixed discounts, stacking rules, expiry edge cases Zero & negative amounts, currency rounding, missing coupon codes Tiered pricing boundaries, promotional cap enforcement 34 passed — also caught a rounding bug on line 214
Tip: Works the other way too — write the tests first, then ask Claude to implement until they pass.
Before writing a line of code — is this worth building? What already exists?
> I want to build a local-first markdown note app with backlinks and graph view. What's already out there? Is there room for something new?
Established players: Obsidian (dominant, plugin ecosystem, free for personal), Logseq (open source, outliner-based), Notion (cloud-first, different audience) Gaps I see: • Obsidian plugins are fragile — no official API stability guarantee • None handle structured data well (tables, databases, schemas) • Collaboration is bolted on, not native • Mobile experience is weak across all of them Where there's room: A local-first app focused on structured knowledge (not just freeform notes) with real-time collaboration could carve out a niche. Think "Obsidian meets Notion meets Linear." Worth considering: The note app market is crowded. Your differentiator needs to be sharp. What's the specific pain point you're solving?
A README for Claude, loaded every session. Initialize with /init. Three levels:
Your personal global preferences. Applies to every project. Style, tone, preferred patterns.
Project root. Committed to git. Shared with the team. Build commands, conventions, architecture.
Personal project overrides. Gitignored. Your local paths, debug preferences, workflow quirks.
Tip: Ask Claude to review your CLAUDE.md — tell you what it finds useful, what's just noise, and what's duplicated or unhelpful. It will tell you honestly.
A classifier evaluates each action for risk. Safe ones proceed automatically. Risky ones pause for you. --permission-mode auto
Human in the loop: anything irreversible, customer-facing, or legally sensitive.
Let Claude run: generation, refactoring, tests, analysis, boilerplate — anything you can review and roll back.
Claude maps out the full plan before touching anything. You review, adjust, approve. Best for unfamiliar codebases or big changes.
Before giving Claude autonomy, ask: what's the worst it could do? Read-only analysis of a test repo — let it run. Modifying production config — use Plan mode. The bigger the blast radius, the more human oversight you want.
Claude Code expects a git repo. Without one, you have no undo button. Not using version control is setting up for disaster.
Want to run multiple agents in parallel on different features — without them stepping on each other? Git worktrees let you have multiple working copies of the same repo, each on its own branch, sharing a single .git directory. Perfect for concurrent experimentation.
A skill is a .md file that gives Claude Code a structured recipe for a task. It lives in your repo and gets loaded into context when invoked.
No server. No runtime. No deployment. Just a well-written document.
--- name: test-all description: Build and run tests against all three EF version targets argument-hint: "[optional: test filter] [--model haiku|sonnet]" allowed-tools: Bash(dotnet *), Bash(docker *), Read, Glob, Grep, Agent --- # Then structured markdown with phases, rules, and examples
Frontmatter — metadata that controls discovery, invocation, and tool permissions
Body — the actual instructions Claude follows, written in plain markdown with phases and rules
Check .NET 10 SDK installed. Verify Docker or MONGODB_URI. Fail fast with clear messages.
Spawn 3 sub-agents (EF8, EF9, EF10) via the Agent tool. Each builds and tests independently. Defaults to Haiku for cost.
Collect results into a single table. List any failures grouped by version.
## Phase 2: Parallel Build & Test Spawn three sub-agents in parallel (one per EF version: EF8, EF9, EF10) using the Agent tool. Set the model parameter on each agent to the parsed model from args. Each agent runs: 1. dotnet build $SLN -c "Debug EF{v}" 2. dotnet test $SLN -c "Debug EF{v}" Report: build status, pass/fail/skip counts, any failed test names + error messages.
--- name: test-all description: Build and run tests against all three EF version targets (EF8, EF9, EF10) argument-hint: "[optional: test filter or project name] [--model haiku|sonnet|opus]" allowed-tools: Bash(dotnet *), Bash(docker *), Read, Glob, Grep, Agent --- # Test All EF Versions `$SLN` refers to the solution file path: `{working directory}/MongoDB.EFCoreProvider.sln` ## Arguments Parse `$ARGUMENTS` for: 1. --model <model> — If present, extract and remove it. Use as the `model` parameter when spawning agents. If absent, use `haiku`. 2. Remaining text — Optional filter: - If empty, run all tests across all three versions. - If it looks like a project name (e.g. "UnitTests"), run only that project. - If it looks like a test filter, pass it via `--filter` to `dotnet test`. ## Phase 1: Pre-flight Checks (MUST pass before anything else) 1. net10.0 SDK — Run `dotnet --list-sdks` and verify a 10.x SDK is installed. If missing, stop with: "net10.0 SDK is not installed." 2. Database connectivity — Check that either: - Docker is available (`docker info` succeeds), OR - `MONGODB_URI` environment variable is set. If neither: stop with: "No database available." ## Phase 2: Parallel Build & Test via Sub-agents Spawn three sub-agents in parallel (one per EF version: EF8, EF9, EF10) using the Agent tool. Set the `model` parameter on each agent. Each agent's prompt must include the full build and test commands: 1. Build: dotnet build {sln} -c "Debug EF{version}" -v quiet 2. Test: dotnet test {sln} -c "Debug EF{version}" --no-build --logger "console;verbosity=normal" -v quiet ## Important Rules - Always use absolute paths — never `cd` into directories. - Quote configuration names — they contain spaces ("Debug EF8"). - Parallel is safe — each EF version builds into a separate output dir. - Continue on failure — if one version fails, still report the others. ## Phase 3: Console Summary | Version | Build | Passed | Failed | Skipped | |---------|--------|--------|--------|---------| | EF8 | OK | 142 | 0 | 3 | | EF9 | OK | 145 | 2 | 1 | | EF10 | OK | 148 | 0 | 0 | If any version had failures, list failing test names grouped by version.
Have Claude write its own tools instead of chaining shell commands. Solves two problems at once:
# Instead of approving each: grep -Pboa ... # allow? xxd | sed ... # allow? awk | xxd -r # allow? # One tool, one permission: python tools/patch_binary.py
# curl/wget get blocked by # AI-sniffing CDNs & WAFs # Have Claude write: python tools/get_url.py $URL # Normal user-agent, no blocks # Handles redirects, encoding
# .claude/settings.json "allow": [ "Bash(python *)", "Bash(dotnet *)", "Bash(node *)" ] # All Python tools run freely # Permission fatigue: gone
CLAUDE.md can steer Claude toward this pattern: "write Python tools, not bash chains"
A static analysis tool that catches Z80 register clobbering bugs. Claude wrote it, Claude runs it.
.asm files for ; Clobbers: AF, BC, DE commentsCALL site — flags reads of clobbered registers without a restore> Write the disk read routine Created src/bios/disk.asm Run python3 tools/check_clobber.py WARNING disk.asm:47 call ReadSector clobbers A, but line 49 reads it: or a Updated src/bios/disk.asm:47 Added push af / pop af around call Run python3 tools/check_clobber.py No clobber issues found.
When Claude needs to interact with a running system — an emulator, an API, hardware — you build an MCP server. It's a process that exposes tools over JSON-RPC.
Each tool is a named command Claude can call — run_cpu, peek_memory, screenshot. Claude picks the right one for each step.
Unlike a skill, the server stays alive. State persists across calls — Claude talks to a living process, not a fresh one each time.
Each tool is a function with a name, description, schema, and handler. Claude reads the description to decide when to use it.
// Server setup const server = new McpServer({ name: 'zx84', version: '1.0.0', });
// -- find -- server.tool( 'find', 'Search all 64KB of memory for a byte sequence. Returns up to 64 matches.', { hex_bytes: z.string().describe('Hex byte string to search for, e.g. "CD0050"') }, async ({ hex_bytes }) => { const hex = hex_bytes.replace(/\s/g, ''); if (hex.length % 2 !== 0) return text('Hex string must have even length'); const needle = new Uint8Array(hex.length / 2); for (let i = 0; i < needle.length; i++) needle[i] = parseInt(hex.slice(i * 2, i * 2 + 2), 16); return text(doFindBytes(addr => spec.memory.readByte(addr), needle)); }, );
~20–35 lines per tool. The whole zx84 server is ~1,700 lines for 30+ tools.
Every panel you see here has a corresponding MCP tool. The debugger, the drives, the display — Claude controls all of it.
run N frames step N instructions continue until breakpoint step_frame one frame exactly
breakpoint set/list port_watchpoint I/O traps disassemble Z80 mnemonics registers full CPU state trace full/portio/zxtl
memory hex dump peek poke read/write bytes find byte sequence search port_in read I/O port port_out write I/O port
key press key for N frames type type a string handles symbols, shift combos, enter, etc.
load TAP TZX SNA Z80 SZX DSK save SZX snapshots model switch 48k/128k/+3 eject tape or disk
ocr read the screen Claude can "see" what the Spectrum is showing → debug tool → feature
30+ tools in ~700 lines of TypeScript. Claude gets full control of a running Spectrum.
A look. A feel. A vibe. Claude can't judge these — but you can make it efficient. Expose the knobs, tweak sliders not prompts.
Up next: Spectrum Analyzer →
Give Claude the tools to measure the result itself. Does the output match the reference? Do the tests pass? This is how these models were trained — attempt, measure, improve. Extend that loop into your domain.
Up next: Sound Chip Emulation →
FFT, frequency binning, stereo separation — all correct first time
Claude can't judge aesthetics. "Make it look better" is a dead-end prompt
Glow, HF Boost, Input Gain, band count, colour schemes — all tuneable by a human
The controls shipped to users. They love tweaking them too
Spec test suites exist — binary pass/fail. Claude iterates quickly
No test suite. How does Claude know if it sounds right?
Claude can load songs, trigger playback, control the emulator
Claude analyses WAV output against reference material. Now it can measure the delta
// The iteration loop: 1. Load reference song via MCP 2. Emulate → record to WAV 3. Analyse WAV (frequency content, envelope shapes, timing) 4. Compare against reference WAV 5. Identify delta → adjust emulation 6. Repeat // Without the MCP + WAV recording, // this loop doesn't exist. // Claude is flying blind.
Key insight: CPUs have test suites. Sound doesn't. So you build the test suite by giving Claude the tools to measure.
What happens when the AI isn't available?
The daily risks of AI-assisted development.
You stop understanding your own codebase. Claude wrote it, you approved it, but you can't explain it.
Fix: Always review. Use plan mode. If you can't explain it, don't ship it.
Plausible code that's subtly wrong. APIs that don't exist. Most likely when information is obscure or there simply is no way to do what you've asked.
Fix: Tests, type checking, manual review. This is why deterministic tools matter.
Long sessions where Claude builds on its own earlier mistakes. Compounds silently.
Fix: /compact regularly. Fresh sessions for fresh tasks. Commit before big changes.
The risks that keep lawyers and security teams up at night.
Malicious instructions hidden in code comments, docs, or dependencies that hijack Claude's behaviour.
Claude may generate code resembling copyrighted or GPL-licensed work. You're legally responsible for what ships.
Claude may embed API keys, secrets, and tokens directly into code that ends up in your repository or shipped to the client side.
Low stakes, high learning. Pick something you've always wanted to build. Let Claude help you figure out the stack and get it running.
Pick your most painful repeated workflow. Write a SKILL.md. Use it tomorrow. Zero setup, instant payoff.
anthropic.skilljar.com/claude-code-in-action — Anthropic's official hands-on course. Covers everything from basics to advanced workflows.
/powerupInteractive lessons right inside Claude Code. Animated demos teach context management, permissions, hooks, MCP and more — without leaving the terminal.
I'll be around after the talk if you'd like to chat, ask questions, or see a demo.
Skills & MCP servers are portable across most of these.
Anthropic. 1M context. Deepest IDE integration. Auto mode.
Moonshot AI. Open source. 100 parallel agents. 5–6x cheaper.
Fully open source. Provider-agnostic. Any model, no lock-in.
Google. Massive context. Free tier. Strong on search.
OpenAI. GPT-powered. Sandboxed execution.
Fork of VS Code. AI-native editor. Inline edits, chat, composer.
Codeium. AI-first IDE. Cascade flow for multi-file edits.
JetBrains. Built into IntelliJ, WebStorm, PyCharm. Native experience.
GitHub/OpenAI. Autocomplete + chat. VS Code, JetBrains, Neovim.
Most CLI tools now have official plugins for VS Code & JetBrains.
Opus (deep reasoning), Sonnet (balanced), Haiku (fast & cheap). Anthropic.
1T params MoE. Open weights. Agent Swarm. Moonshot AI.
Zhipu AI. Open source. Strong on code. Z.AI platform.
Google. Huge context window. Native multi-modal.
OpenAI. Strong all-rounder. Large ecosystem.
Everything Claude reads and writes consumes tokens from a finite window. CLAUDE.md costs context on every session.
200K tokens default context. 1M tokens with extended thinking (Max plans). Files, commands, responses — it all counts.
As the window fills, Claude forgets earlier context. Signs: re-reading files, forgetting decisions, contradicting earlier work.
/compact summarize & compress /clear wipe & start fresh
Long session? /compact regularly. New task? /clear.
Claude Code doesn't have a pre-built index of your codebase. It searches actively — like a developer exploring unfamiliar code.
No setup required. Works on any codebase immediately. Claude chooses the right tool for each search — regex for patterns, glob for file discovery, full reads for understanding.
Every search consumes context tokens. Large codebases can eat through the window fast. This is why /compact and CLAUDE.md matter — they reduce how much searching Claude needs to do.
Local RAG-based code servers and indexing tools let Claude query your codebase without reading entire files into context. Faster, cheaper — but still community-driven, nothing official yet.
During the sound chip work, Claude was recording songs by playing them in real-time to a WAV file. A 3-minute song took 3 minutes to test.
Without being asked, Claude wrote a new MCP tool that rendered the song directly to WAV in milliseconds — bypassing the audio output entirely.
It made its own iteration loop faster. Nobody asked for this.
// Before (real-time): play_song("commando.sid") record_to_wav() // ⏱ 3 minutes... stop_recording() // After (Claude's addition): render_to_wav("commando.sid", { duration_ms: 180000, sample_rate: 44100 }) // ⏱ ~200ms
Instead of one agent working sequentially, multiple agents work in parallel on the same problem.
Agent toolWhen Claude gets stuck in a cycle of failed attempts — each one confidently wrong.
dotnet test $SLN -c "Debug EF8" not "run the tests"Bash(dotnet *) not Bash(*) — least privilegeTopics I'm exploring. Let me know what interests you.
Claude Code vs Gemini CLI vs OpenCode vs Codex vs Cursor. Models compared: Claude, GPT, Gemini, Z.AI. What's best for what.
Case study: Intracia CMS and its AI-powered JSON schema agent. How to go from "we should add AI" to shipping a real feature.
Embeddings, semantic search, retrieval-augmented generation. Making AI work with your data without fine-tuning.
Claude Desktop, automation workflows, document analysis. What AI can do for people who don't write code.
Prompt injection, supply chain risks, sandboxing, data privacy. The stuff you need to think about before going to production.
What would be most useful to you? Come talk to me afterwards.