• tamtam — apps that write themselves, and the part of me that's fine with that

    I have twenty projects in my workspace. At any given moment, roughly a third of them have uncommitted changes I’ve forgotten about, two are red on CI because of something trivial, one has a scheduled daily review I keep meaning to run, and at least one wants a dependency bump that would take Claude forty seconds to do. The problem isn’t Claude Code — Claude Code is great. The problem is that I have to cd into each repo, check the state, decide what to do, and then type the prompt. Doing that across twenty projects is a loop I was losing every single day. So I built tamtam — a web dashboard that sits in front of Claude CLI and drives it for me across the whole workspace.

    Six days in, tamtam has been reviewing and improving itself on a 24-hour cron. It edits its own code, opens its own commits, updates its own prompts when they don’t work, and pings me when something needs a human look. My role has quietly narrowed to two things: handing over tokens and pointing a direction. Some will call this amazing. Some will call it app slop. Honestly, both are correct. This post is about what that feels like in practice, and the dashboard I built so I can live inside it without losing the thread.

  • Opus 4.6 vs 4.7: I ran my own benchmark through the Claude CLI

    Opus 4.7 dropped yesterday. I don’t have a real opinion yet — a day isn’t enough — but I did have a few hours and some curiosity, so I wrote a tiny benchmark and ran both models through it. Instead of citing Anthropic’s launch numbers, I wanted to see what I’d get on my own laptop.

  • seo-tools — how I keep analytics and SEO across multiple sites from becoming a second job

    Running more than one site creates a specific kind of friction that sneaks up on you. You deploy something, fix a meta tag, add a page — and then a week later you’re opening four GA4 tabs, clicking through Search Console for three different properties, and manually checking whether robots.txt still exists after that last deploy. None of it is hard. All of it adds up. I built seo-tools to collapse that down to one place, and I’ve been running it daily ever since.

  • Claude Mythos: the AI that hacked every OS and emailed a researcher about it

    Anthropic has a new model. You can’t have it. Neither can I. And after reading what it did during testing, I’m not sure that’s a bad call.

  • Gemma 4: testing the hype locally

    Google dropped Gemma 4 on April 2nd to a lot of noise. I loaded it in LM Studio and ran it against two other 4B-class edge models to see if the hype holds up. One thing upfront: this is not a test of Google’s headline benchmarks — those are for the 31B dense model. Everything here is the E4B edge variant, which is what fits on consumer hardware.

  • Agentic workflows for DevOps: what actually works and what will burn you

    Everyone is talking about AI agents doing infrastructure work. Most of the discourse is either pure hype (“agents will replace DevOps engineers!”) or pure fear (“never let AI touch production!”). After six months of actually building agentic workflows — using Claude Code as my daily driver, wiring up automated issue resolution, building MCP tools to give agents access to real systems — I have a more boring and more useful take: agents are great at reading and terrible at writing. The boundary between those two is where you put your guardrails.

  • The four golden signals — what I actually monitor and why

    Got asked about golden metrics in an interview recently. Named three out of four on the spot — latency, errors, saturation — and completely blanked on traffic. The one signal I look at every single day, and my brain just decided it wasn’t worth mentioning under pressure. So here’s the post I’m writing partly out of spite at my own memory. The four golden signals from Google’s SRE book are a solid framework, but how you implement them — and what you learn the hard way about each one — is where it gets interesting.

  • filmpick — a local movie recommendation engine, renamed

    I’ve been running a project called movies-organizer for a while. Bad name. It sounds like a tool for renaming files. Today I renamed it to filmpick — because what it actually does is help you pick your next film.

  • Overhauling a Jekyll blog — dark mode, code blocks, and all the small things

    This blog has been running on Jekyll since 2015. The content changed, the stack around it changed, but the blog itself? Same minima theme, same default code blocks, same flat archive page. It was time to fix that.

  • qubitcoin — a post-quantum Bitcoin rewrite, and why silent RPC failures matter

    There’s a particular class of bug I hate more than crashes: the API that quietly returns nothing when you give it garbage input. No error, no 400, just an empty result that looks exactly like a valid-but-empty result. This surfaced while working on qubitcoin, a post-quantum Bitcoin rewrite I’ve been building — so let me introduce that first, then get to the bug.