April 1, 2026 · 12 min read
In previous posts, I've described LLMs as "helpful liars." This is useful mental model, but it doesn't really help answer the question, "how do you get them to lie less?" After using LLMs 15+ hours a day (plus a bunch of my own independent research on exactly this) I've developed a new mental model...
March 29, 2026 · 5 min read
I've been exploring OpenClaw, but giving so much access to an autonomous AI agent with 50+ connectors, persistent memory, arbitrary code execution, and 512 reported vulnerabilities that can impersonate me by design has made me uneasy. So I wanted to explore a safer version. What "safer" means to me...
March 18, 2026 · 5 min read
This post is the story behind the research. If you want the full paper with methodology, statistics, and raw data: Receipt-Gated Pipelines on GitHub. I caught three AI models fabricating security reports. Complete with CVE numbers. Severity ratings. Remediation advice. For vulnerabilities they...
March 11, 2026 · 11 min read
This post is the story behind the research. If you want the full paper with methodology, statistics, and raw data: Structure Beats Scale on GitHub. This shouldn't have worked. I took a model that costs a tenth of a cent per call — Mercury 2, a diffusion-based reasoning model that nobody was talking...
February 20, 2026 · 12 min read
import HypothesisHeatmap from '../../components/HypothesisHeatmap.tsx'; import CacheHitChart from '../../components/CacheHitChart.tsx'; import TokensChart from '../../components/TokensChart.tsx'; import ModelMixChart from '../../components/ModelMixChart.tsx'; Last time I wrote about Flowstate, I...
February 18, 2026 · 11 min read
I built an encrypted task manager with AI agent teams. I wrote about it. People seemed interested in the workflow — the skills, the waves, the retrospectives. What I didn’t tell you was that the workflow was held together with duct tape and copy-pasta. Every sprint, I was the bottleneck. I designed...
February 17, 2026 · 11 min read
I’ve been a Product Manager for 20 years. Design Within Reach, MyFitnessPal, Tonal, Habitry, a cybersecurity startup that Mimecast acquired, a VR fitness app that Meta acquired. On paper, wildly successful. In practice, I’ve been pretty bored for the last six years. Not unhappy. Not ungrateful....