An agentic AI that can send email, edit a CRM, move money, or call shell · is a security surface, not a chatbot. The blast radius of a jailbreak / prompt injection multiplies by whatever tools you wired up. This is the 4-layer pattern we apply on every agent build.
Layer 1 · Capability scoping
The agent gets tools that are the minimum it needs. Not 'send_email' in general · 'send_email_to_customer' where 'customer' is scoped to the current user's account. Design each tool as narrow as possible; any tool that could 'do anything' is wrong.
Layer 2 · User-backed authorization
Every tool wrapper re-authorizes the current user. The LLM can call the tool; the tool checks 'is this user allowed to do X to Y'. Treat the LLM as untrusted caller logic · always.
def safe_send_email(to: str, body: str, ctx: UserCtx):
if to not in ctx.allowed_recipients:
raise PermissionError(f"recipient {to} not authorized")
if len(body) > MAX_BODY:
raise ValueError("body too long")
audit_log(ctx.user_id, "send_email", to, hash(body))
return email.send(to=to, body=body, from=ctx.user_email)Layer 3 · Per-call audit logging
Every tool call · who, what, when, why (the prompt that triggered it), how much (tokens, dollars). Keep 90-day retention minimum for compliance + debugging. You'll need this for the first incident; not having it at that moment is the worst-case day.
Layer 4 · Circuit breakers
Rate limit per user. Rate limit per tool. Global kill switch on misbehaviour. When MTTR (mean-time-to-revoke) of a compromised agent is < 10 minutes, the damage budget is bounded. When it's an hour, it isn't.
If you can't answer 'what can the agent do that would materially hurt the business?' in < 60 seconds, you haven't scoped tools enough. Start by reducing surface, not by adding guardrails to a broad one.

By
Dezső Mező
Founder, DField Solutions
I've shipped production products from fintech to creator-tooling · for startups and enterprises, from Budapest to San Francisco.
Keep reading
RELATED PROJECTS