- When the Picture Is the DataA research paper broke my brain last week: the fastest way to dig through millions of rows might be to draw on them, not query them. A day later I'd turned the
- The Honesty-Engagement Tradeoff Is a Measurement FailureA single conversation with a sycophantic chatbot makes people 10–28% less willing to apologize and 13% more likely to come back. Everyone thinks you have to cho
- The Factuality LadderLLMs make stuff up. You can't fix that. But you can make the lies predictable, detectable, and blocked from spreading. Five rungs, one meta-pattern — no fine-tu
- Building ClawClamp: Autonomous AI Agents Without Losing SleepOpenClaw has 512 known vulnerabilities and wants access to your email, calendar, and code. I built a containment harness that starts with zero access and kills
- Your AI Tools Are Lying to You (And Each Other)I caught three AI models fabricating security reports — complete with real CVE numbers — for vulnerabilities they never looked up. 686 trials, nine models, and
- Structure Beats ScaleI let a model that costs a tenth of a cent per call review code from a model that costs twelve cents. Across 3,121 runs, the cheap reviewer made the expensive g
- The Only Limit Is NoticingI built a compliance-aware Kubernetes scheduler in a day. The hard part wasn't the code. It was seeing that the problem existed.
- What I Learned Running 19 Sprints With AI AgentsNine more sprints, a third project, and five falsification experiments later, I can tell you what actually matters. The answer is simpler — and more uncomfortab
- Today I Stopped Babysitting AI Agent Teams and Started Coaching ThemI built an encrypted task manager with AI agent teams. I wrote about it. People seemed interested in the workflow — the skills, the waves…

Stevo's Projects
Stevo's Work
Here's the stuff without NDAs.