Building ClawClamp: Autonomous AI Agents Without Losing Sleep

I’ve been exploring OpenClaw, but giving so much access to an autonomous AI agent with 50+ connectors, persistent memory, arbitrary code execution, and 512 reported vulnerabilities that can impersonate me by design has made me uneasy. So I wanted to explore a safer version.

What “safer” means to me is that if someone hijacks the agent at 3 AM, it has access to a disposable machine with nothing on it — not my laptop, not my email, not my bank. The machine gets killed automatically before I wake up. I start the agent with zero access and only connect the APIs it actually needs. And the layers that enforce all of this are plain old code — firewall rules, file hashes, authenticated check-ins — not LLM-based trust. LLMs add depth. Code holds the line.

ClawClamp is how I tried to do that.

The System

ClawClamp is a deployment harness for OpenClaw. Two principles drive the architecture. First: the agent runs on a machine I don’t care about, monitored from a machine I do care about. Second: start with nothing allowed, then add only what I need.

[My Phone / Laptop]
    ^
    | (chat + notifications -- all one-way, agent to me)
    |
[Agent's Machine -- DISPOSABLE]
    The agent + firewall + Docker sandbox
    |
    | (monitored from outside)
    v
[My Server -- TRUSTED, on my private network]
    Checks in every 30 seconds to 15 minutes
    Keeps its own record of what the agent did
    Kills the agent's machine if anything changes

The agent’s machine starts with zero outbound access. Three independent layers add back only what’s needed:

Layer 1: The agent’s own settings. Tool restrictions, message allowlists, sandbox mode. These are the agent’s built-in controls — useful as a first line, but they’re software settings. If the agent is compromised, these are the first things that the agent itself can change.

Layer 2: A firewall that only allows traffic to 5 specific servers. This is the first layer that doesn’t care what the agent thinks it’s doing. The firewall resolves each allowed domain (the AI API, a chat server, a notification server, a calendar, a search API) to its actual IP addresses and blocks everything else. Code running inside the Docker sandbox gets no network access at all. It doesn’t matter what instructions the agent receives — the packets either go to one of those 5 servers or they go nowhere.

Layer 3: A monitor running on a completely separate machine. My server reaches into the agent’s machine every 30 seconds to 15 minutes and checks that nothing has changed: Are the config files the same ones I deployed? Is the agent’s binary the original, or has someone swapped it out? Are there unexpected processes running? Is the agent claiming to have done things that the logs don’t show? If anything looks wrong, my server kills the agent’s machine through the cloud provider’s API — it doesn’t even need the agent’s machine to cooperate. Just “suspicious? = kill it with fire.”

What If Things Go Wrong?

Architecture diagrams are abstract. Let’s walk through what actually happens.

What if the agent starts sending my data somewhere it shouldn’t? The agent tries to reach some server that isn’t on the list. The firewall blocks it. The data never leaves the machine. The agent can be convinced to try anything — the firewall doesn’t read instructions, it reads IP addresses.

What if a malicious plugin tries to phone home? Plugins run inside a Docker container with no network access at all. It can’t connect to anything. Even if it breaks out of the container to the main operating system, the firewall still only allows those 5 servers. Two independent layers would have to fail at the same time.

What if someone takes over the agent completely — full control of the machine? They now own a machine with no credentials for my email, my Slack, my bank, or anything else I care about. It has no keys to connect to other machines. It can talk to 5 servers. It can’t spread to my laptop, my server, or my network. Meanwhile, my server notices within 5 minutes that files on the agent’s machine have changed, kills the droplet, exports the agent’s memory, and automatically rebuilds a fresh one with new credentials. My phone buzzes with a notification. By the time I read it, the compromised machine is gone and a clean one is already running.

What if the attacker is good enough to keep everything looking normal? They’d need a secret key (generated when the machine was first created, stored on both machines) to fake the check-ins. They’d need to keep every monitored file byte-identical while still doing something useful. Every 15 minutes, the full audit checks CPU usage, running processes, open ports, and whether the agent’s program has been tampered with. Staying hidden is hard, the window between checks is short, and the prize is a machine that can talk to 5 servers and has nothing worth stealing.

What if the agent plants bad instructions in its own memory that survive a rebuild? When the agent’s memory gets carried over to a fresh machine, it goes through an LLM that strips out anything that looks like injected instructions. That’s a soft check — a clever enough payload might get through. But my instructions are stored separately on my server and the agent can’t touch them. And a separate log records what the agent actually did by parsing the real server logs. If the agent’s memory says one thing and the logs say another, the mismatch gets flagged. The code-based check catches what the LLM check might miss.

The pattern across all five: the code-based layers (firewall rules, file checks, authenticated heartbeats) do the blocking. The LLM layers (memory filtering, log analysis) add depth. And because the machine started with almost nothing on it, even a successful takeover doesn’t give the attacker much to work with.

Where the Layers Broke

I tested my own code adversarially and found 10 vulnerabilities — 2 critical, 4 high, 4 medium — in a codebase with 329 tests and 92% coverage. The worst one: the check-in listener, the component responsible for killing a compromised machine, was accepting requests from anyone. No authentication. Any computer on the internet could send a fake check-in and keep a compromised machine alive forever. The one component the entire system depends on when everything else fails — and it was wide open. I built 16 sprints of layered enforcement and the damn failsafe had no lock on the door.

I should be honest about the limits. I built it and I tested it against myself. That’s better than not testing at all, but the person who wrote the code has blind spots shaped exactly like the code. The check-in vulnerability is a good example — authentication wasn’t part of my mental model when I wrote it, so it never occurred to me to test for it. The adversarial pass found what I could see from a different angle. It didn’t find what I can’t see at all.

ClawClamp is 1,100 lines of code, 329 tests, 92% coverage, live deployment, and 7 red team passes. It took about 125 minutes of active build time across 24 sprints of Flowstate with Claude Code.

That code is at github.com/smledbetter/clawclamp. IMHO, any autonomous agent with broad capabilities and persistent memory needs this kind of containment. The specific tools — DigitalOcean, Linux firewalls, Python — are incidental. The principles aren’t: start with zero access. Make every boundary code you can read and verify. Use LLMs to add depth, not to hold the line. Monitor from a machine the agent can’t touch. And when you’ve built the thing you’re proud of, ask what happens when it fails — then start there.