Receipt-Gated Pipelines
Cryptographic Verification of Tool-Call Claims in Multi-Agent LLM Systems
Debate Degrades Reasoning
Single-round debate degrades LLM reasoning in symmetric settings — 2,100 evaluations, 11 conditions, two benchmarks
Structure Beats Scale
How Structured Review Outperforms Brute-Force Generation in LLM Code Synthesis
Stevo's Writing
White papers, blog posts, and various musings.