Governing AI-agent pull requests: building interlock

When you let a coding agent open pull requests, you hit a fork almost immediately. Review every one by hand and you have thrown away the speed that made the agent worth using. Approve them on trust and you are one confident-but-wrong PR away from a rewritten auth path or a quietly loosened CI config.

I wanted the dial between those two, and I wanted it to be boring — predictable, inspectable, and impossible to talk out of its verdict.

A fuse, not a judge

The temptation is to throw another language model at the problem: "let an AI review the AI." I went the other way. interlock is deterministic — glob matching and rule evaluation over a single policy file:

mode: enforce
tiers:
  tier0: ["docs/**", "**/*.md"]          # behaviour-neutral
  tier2: [".github/**", "interlock.yml"] # protected — humans only
rules:
  agent-on-tier2: block

Same input, same verdict, every time. You can read the whole policy in under five minutes and reason about exactly what it will do. A guard you cannot predict is not a guard.

Tiers by reversibility

The organising idea is the reversibility of harm. A docs change is behaviour-neutral; a change to your CI workflow is not. So paths sort into tiers, and a pull request inherits the maximum tier of any file it touches. One protected file in a thousand-file PR makes the whole thing stop for a human — exactly the asymmetry you want from a safety device.

The invariant that matters most

The gate cannot edit its own off-switch. The policy is always read from the pull request's base branch, so a PR can never weaken the rule that judges it, and the policy file is itself a protected path. A safety mechanism that an attacker — or an over-eager agent — can disable in the same change it is trying to sneak through is not a safety mechanism.

That is the whole philosophy: make the gate dumb, predictable, and unable to fail open.