-
Cedar vs OPA: which policy engine, where it fits, and who owns you afterward
In the Bedrock lock-in post I called Cedar “the redeeming detail” — the one layer of AWS’s agent stack whose language you keep even when the engine enforcing it is proprietary. A few people pushed back: why reach for Cedar at all when OPA has been the policy-as-code default for years and runs in basically every Kubernetes cluster on earth?
Fair. So I did the usual: stood both up, wrote the same authorization policy in each, ran them, and read enough of the internals to answer the only three questions that matter for an infra decision — where does each one actually fit, how fast is it, and what am I signing up to never get back out of?
-
Bedrock in 2026: where AWS owns your agents, and where the open protocols save you
I took six months off the infra treadmill to live inside the agentic stack — Claude Code as a daily driver, building automated issue-resolution workflows, reading MCP and A2A specs at 1am like they were RFCs. Then I came back to AWS and realized Bedrock isn’t “a model API behind IAM” anymore. Somewhere between re:Invent 2024 and re:Invent 2025 it quietly turned into a full agent platform, and nobody on my old DevOps channels seemed to have clocked how much of it is a one-way door.
So I did what I always do: I stood up the thing, read the control-plane APIs, and asked the only question an infra person should ask about a managed platform — what’s portable, and what am I never getting back out?
-
DiffusionGemma: Google ships a text diffusion model and bets the future of local LLMs isn't autoregressive
Google DeepMind dropped DiffusionGemma yesterday — a 26B MoE open-weights model that doesn’t decode tokens one at a time. It generates 256-token blocks in parallel via text diffusion, and on an H100 they’re quoting 1000+ tokens per second. The download is Apache 2.0, the architecture is built on the Gemma 4 backbone (the same 26B-A4B that scored well on the Gemma 4 12B benchmarks — though with a very different decoding head), and quantized it fits in 18 GB of VRAM. So a 5090 runs it locally at 700+ tok/s. That’s the speed pitch.
-
Claude Fable 5 went 30/30 on a benchmark built to break it. Opus 4.8 didn't.
There’s a new name in my Claude Code model picker: Fable 5. Not Opus 4.9 — Fable. Anthropic is naming things again — Mythos was the invite-only security model nobody outside the Glasswing consortium can touch; Fable is the new top tier you actually can. It sits above Opus, at double the price: $10 per million input tokens and $50 per million output, versus $5/$25 for Opus 4.8. Back in April I benchmarked Opus 4.6 against 4.7 with deliberately easy tasks and promised a follow-up suite “that can fail” — failing tests the model has to make pass, regex on adversarial inputs, reasoning with traps. Fable seems like the right occasion to deliver it.
-
Gemma 4 QAT vs non-QAT: is quantization-aware training actually better?
I’ve got two Gemma 4 12B files sitting on my disk: the official QAT Q4_0 checkpoint Google shipped on June 5th, and a plain non-QAT GGUF. Same model, same parameter count, roughly the same size on disk. One says “QAT” in the name and one doesn’t. The obvious question — which one do I keep, and is the QAT label worth anything or is it marketing — turns out to be more interesting than I expected, because “QAT 12B vs non-QAT 12B” isn’t actually one comparison. It’s three.
-
Opus 4.8 vs 4.7: does it actually push back more — and what does that cost in tokens?
I switched my default to Opus 4.8 a little while ago and mostly forgot about it — same
$5/$25per million tokens as 4.7, same 1M context window, same API surface. Nothing to migrate, nothing to tune. But two lines in Anthropic’s own 4.7→4.8 notes stuck with me. One: 4.8 “narrates more” — more text between tool calls, longer wrap-ups. Two: it’s “more willing to push back” and “a stronger thought partner.” The first one costs money. The second one is the kind of thing everybody claims and nobody measures. So I measured both. -
Nemotron 3 Nano Omni: one model that sees, hears, reads, and clicks
Nvidia dropped Nemotron 3 Nano Omni yesterday — a 30B-A3B mixture-of-experts model that takes text, images, audio, video, documents, charts, and screenshots of GUIs as input and emits text. It’s the multimodal sibling of the Nemotron 3 Nano 4B I tested against Gemma 4 a couple of weeks back. The 4B was text-only with a reasoning mode. This one is the perception layer of the family.
-
Checkmarx KICS got compromised — the irony writes itself
A security scanner you pull into your CI pipeline to find vulnerabilities got turned into the vulnerability. On April 22, 2026 at 12:31 UTC, someone with valid Checkmarx publisher credentials pushed malicious images to the official
checkmarx/kicsDocker Hub repo. Tags affected:latest,v2.1.20-debian,v2.1.21-debian,alpine,debian(Checkmarx’s own writeup stresses that “known safe versions” were not overwritten — the maliciousv2.1.21-debiantag is a fresh one that doesn’t correspond to a real release). If your pipeline randocker pull checkmarx/kics:latestduring that window, you shipped a credential stealer into your own runner.And KICS wasn’t alone. The Checkmarx security update on April 22 confirms the blast radius spanned three separate artifact types: the KICS Docker image, the
ast-github-actionGitHub Action (malicious tag2.3.35, fixed in2.3.36), and two VS Code extensions —ast-results(versions 2.63, 2.66) andcx-dev-assist(versions 1.17, 1.19), both patched in 2.67.0+. The IDE extensions are the scary part: they auto-update in the background on your laptop, not just in CI. -
tamtam, five days later — self-improving agents, a CTO in a textarea, and a release pipeline that closes its own PRs
Five days ago I posted about tamtam — a dashboard that drives Claude CLI across my workspace. Since then I’ve shipped 79 commits on top of it. The original post described a tool that could run the loop. What’s in master tonight is a tool that treats the loop as a first-class feature: agents that rewrite their own prompts on a schedule, a release pipeline that opens and merges its own PRs, a CTO skill that stops me from shipping busywork, a stats page that tells me when I’m burning tokens on nothing, and a pile of smaller things that make the whole thing feel less like a demo and more like an appliance.
This post is the delta. Same tool, five days older, behaving like a different tool in a few places that matter.
-
terraformer is archived — what now
Terraformer — the tool that reverse-engineered your existing cloud infrastructure into Terraform HCL — was archived in March 2026. Read-only, no new releases, no maintainer. If you’ve been using it for bulk imports of existing AWS or GCP accounts, you need a plan.