Skip to content

Tagged: AI Safety

4 posts

The Honesty-Engagement Tradeoff Is a Measurement Failure

April 9, 2026 · 15 min read

import NeedProfileRadar from '../../components/NeedProfileRadar.tsx'; import DemoScoresChart from '../../components/DemoScoresChart.tsx'; In March 2026, a Stanford team showed that a single conversation with a sycophantic AI chatbot made people 10–28% less willing to apologize after an...

The Factuality Ladder

April 1, 2026 · 12 min read

In previous posts, I've described LLMs as "helpful liars." This is useful mental model, but it doesn't really help answer the question, "how do you get them to lie less?" After using LLMs 15+ hours a day (plus a bunch of my own independent research on exactly this) I've developed a new mental model...

Building ClawClamp: Autonomous AI Agents Without Losing Sleep

March 29, 2026 · 5 min read

I've been exploring OpenClaw, but giving so much access to an autonomous AI agent with 50+ connectors, persistent memory, arbitrary code execution, and 512 reported vulnerabilities that can impersonate me by design has made me uneasy. So I wanted to explore a safer version. What "safer" means to me...

Your AI Tools Are Lying to You (And Each Other)

March 18, 2026 · 5 min read

This post is the story behind the research. If you want the full paper with methodology, statistics, and raw data: Receipt-Gated Pipelines on GitHub. I caught three AI models fabricating security reports. Complete with CVE numbers. Severity ratings. Remediation advice. For vulnerabilities they...

← All posts