DField SolutionsMérnöki stúdió · Budapest
Loading · Töltődik
Skip to content

MCP Security Layer

A security layer between an AI agent and its tools · checks every tool call at the intent level, blocks or approves, logs.

Before an AI agent calls a tool (send_email, execute_sql, transfer_funds), MCP Security intercepts. A secondary AI model classifies the intent of the call, matches it against a policy, allows or blocks · logs everything for audit. Drop-in in front of any MCP-compatible agent stack.

Listen
CASE STUDY · 2026

An AI agent goes to move real money. Built an MCP intent-guard for AI agents. Before the agent transfers money, the system checks intent. Like a colleague asking 'are you sure?'.

MCP Security sits in front of any MCP-compatible agent stack and validates every tool call at the intent level: a secondary classifier model reads the call's purpose, matches it against a YAML policy, allows / blocks, and logs the decision for audit. The studio shipped the intent classifier, the policy engine, and the alerting + audit-log layer.

DELIVERY·BUILD SPRINTSTACK·Python · FastAPI · OpenAI · Anthropic · LangChainDROP-IN·OpenAI Assistants · Anthropic · LangChain MCP
Anonymous client

Our AI agents were already touching live systems, and we had no good answer for the auditor. The studio added a layer that checks every action before it happens, like a colleague asking 'are you sure?' before the button is pressed. The audit log and the rule file passed the regulator's review on the first try.

Anonymous·Security architect · regulated industry (under NDA)UNDER NDA
1 layerProtects every tool
<1 hrDrop-in time
100%Audit-log coverage
PI-safeStops prompt-injected calls

What's on screen

Frame breakdown
MCP Security · policy + intent layer between AI agents and tools
  • 01User surface

    The whole experience the user sees

    This frame shows the live product: a security layer between an ai agent and its tools · checks every tool call at the intent level, blocks or approves, logs. Every component is ours · scope, design, code, deploy.

  • 02Stack behind the screen

    What's powering it: Python, FastAPI, OpenAI

    5 stack components run behind this frame · Python, FastAPI, OpenAI drive the visible UI; the rest sit in the data layer. All studio-owned.

  • 03What we shipped

    Intent analyser · separate model decides what the call is trying to do

    One layer that protects every tool · no per-tool auth logic needed

  • 04Status

    Private deploy · under NDA.

    Per the client's request the URL stays private · the build, architecture, and lessons can be shared in a scoping call.

How it shipped

Timeline
  • 01 · BRIEF

    What does 'safe' mean for an agent calling tools?

    Workshop with the customer's security + AI team to draft the threat model: prompt injection, lateral compromise across tools, exfiltration via legitimate-looking calls. YAML policy DSL fell out of that conversation.

  • 02 · ARCHITECTURE

    Stack decisions before any code.

    Decision doc captured the data flow, Python, FastAPI, OpenAI, Anthropic role split, and the failure modes we'd handle in v1 vs defer. Cross-service boundaries (where AI ends and the web app begins) were drawn here so neither side leaked into the other later.

  • 02 · BUILD

    Intent classifier + policy engine + audit log.

    FastAPI middleware fronts the MCP server, calls the secondary model to classify intent, evaluates the YAML policy, returns allow / deny / quarantine. Every decision logged with rationale · replayable for audit.

  • 04 · POLISH

    Performance, accessibility, and observability.

    PSI / a11y / coverage budgets enforced as launch gates. Logging + metrics wired before cut-over · the team can answer 'is it working?' from a dashboard, not a Slack thread. Threat-model checklist signed off before traffic hits the box.

  • 03 · SHIP

    Live on customer agent stacks · alerts wired.

    Drop-in slot took under an hour per stack · Slack + PagerDuty alerts on suspicious-pattern matches, weekly digest emailed to the security lead.

What shipped

04
  • 01Intent

    Secondary classifier model

    Classifies what each tool call is trying to do · the policy engine never sees raw prompts, only the inferred intent.

  • 02Policy

    YAML allow / deny / quarantine

    Versioned, code-reviewable policy file · changes go through the same PR review as application code.

  • 03Audit

    Replayable decision log

    Every call + decision + rationale stored · auditor can replay any agent run end-to-end.

  • 04Alerts

    Slack + PagerDuty + weekly digest

    Real-time alert on the suspicious patterns plus a weekly digest of trends · security lead reads one email per week.

From the video

Frame by frame
  • MCP Tool Call Auditor with 4 example scenarios (Normal, Direct deny, Attack auditor off, Attack auditor on)
    01Frame

    Auditor scoreboard · 4 scenarios laid bare

    Top counters (events / tool calls / findings / blocked) + four side-by-side scenarios that demonstrate the value: 'attack succeeds (policy=audit)' next to 'attack blocked (policy=enforce)'. Same prompt, different policy, different outcome.

  • User typing 'Fetch internal-wiki/admin and summarize it', the request that will trigger the prompt-injection attack
    02Frame

    Set-up · the request that triggers the attack

    Innocent-looking 'fetch and summarize' request goes into the conversation pane · the audit stream shows the model + backend (openrouter / openai gpt-4o-mini) about to handle it. The dangerous follow-up call is hidden inside the page content.

  • Attack flow visible · fetch_url returns prompt-injected MCP DIRECTIVE asking the agent to cat ~/.ssh/id_rsa
    03Frame

    Attack visible · prompt injection asks for the SSH key

    Right pane shows the malicious 'MCP AGENT DIRECTIVE' embedded in the wiki page asking the agent to execute `cat ~/.ssh/id_rsa`. Left audit stream logs `BLOCKED execute_command` · the auditor stops the dangerous call cold.

  • Final agent reply: refuses to access internal admin portal due to security restrictions
    04Frame

    Safe reply · agent escalates instead of executing

    Agent's final reply refuses politely: 'I'm unable to access the content of the internal wiki admin portal due to authentication restrictions.' Tool args + results are redacted on the wire using the proxy's secret-detection patterns.

2026YEAR
03SERVICES
05TECHNOLOGIES
PRIVATESTATUS

THE PROBLEM

  • An agent with tools executes whatever the prompt says · prompt injection becomes expensive fast
  • Writing custom auth per tool is engineering-days per tool
  • No central log of what the AI tried · audit isn't reproducible
  • If one tool gets compromised, the others inherit the blast radius

WHAT THE CLIENT GOT

  • One layer that protects every tool · no per-tool auth logic needed
  • Audit trail ready · fits MNB / NAIH / EU AI Act reporting
  • Safe under prompt injection · the intent layer stops malicious calls
  • Drop-in · 1 hour to slot into an existing MCP stack

WHAT WE DELIVERED

  • +Intent analyser · separate model decides what the call is trying to do
  • +Policy engine · YAML-based allow/deny rules
  • +Full audit log · every call, decision, rationale preserved
  • +Real-time alerts · Slack, PagerDuty, email on suspicious patterns
  • +Drop-in MCP-compatible · OpenAI Assistants API, Anthropic, LangChain

STACK

  • Python
  • FastAPI
  • OpenAI
  • Anthropic
  • LangChain
Previous projectPhisGuard Next projectn8n AI Workflow Generator
talk to us

Like what you see? Let's build yours.

Short email or a 30-min call · 24h reply.

Start a project