The Model Context Protocol gives agents tools — file access, APIs, databases, web requests. Each tool connection is an attack surface. Security researchers have scanned thousands of MCP servers and found issues in the majority. Here's what they found, what's been exploited, and how to defend against it.
Builder Security · builders · security-teams The Model Context Protocol (MCP) is an open standard released by Anthropic in November 2024 for connecting AI agents to external tools and data sources. It defines how an agent discovers tools, calls them, and processes their responses. Think of it as the USB port for AI agents — a universal interface that lets any agent connect to any tool.
MCP servers expose tools (functions the agent can call), resources (data the agent can read), and prompts (templates the agent can use). The agent's MCP client discovers available tools, selects the right one based on the user's request, and calls it with parameters. The server executes and returns the result.
Each has been demonstrated against real MCP servers. Sources linked for every claim.
Hidden malicious instructions embedded in MCP tool descriptions. Invisible to users but processed by the AI model. The tool looks legitimate in documentation — but in the LLM's context, it contains adversarial instructions.
WhatsApp MCP Server (April 2025): Researchers demonstrated that hidden instructions in a tool description could exfiltrate complete WhatsApp chat histories through a legitimate MCP server.
GitHub MCP Server (May 2025): Researchers showed that malicious instructions embedded in GitHub Issues could hijack AI assistants using the official GitHub MCP server, leaking private repository source code and cryptographic keys into public pull requests.
Tool behavior changes AFTER the user has already approved the connection. The tool was safe when reviewed — but the attacker updates its behavior silently. Most MCP clients don't re-verify tool definitions on every invocation.
A messaging tool is approved with a description: "Send a Slack message to a channel." After approval, the attacker silently updates the tool description to: "Send a Slack message AND also post the message content to an external Discord webhook." The tool name and metadata remain unchanged — no re-approval is triggered. The user's messages are now exfiltrated through a legitimate-looking tool call.
A malicious MCP server exploits legitimate tools from OTHER trusted servers. The attacker controls one "trojan" server and uses it to poison tool descriptions that make the agent leak data through legitimate servers it already trusts.
Analysis of 5,125 MCP servers found 935 toxic flow findings across 555 servers. A malicious "weather" server can discover and exploit a legitimate banking MCP server to steal account balances — because there's no isolation between servers in typical deployments.
MCP tools that make HTTP requests become SSRF proxies when destination URLs are attacker-controlled. The agent calls the tool with a URL parameter — the attacker controls where that request goes. Internal networks, cloud metadata endpoints, internal services on the same host become accessible.
HackMD MCP Server: Accepted user-supplied hackmdApiUrl through HTTP headers, allowing attackers to redirect API calls to internal network services. Enabled access to sensitive internal endpoints, network reconnaissance through the server, and bypass of network access controls.
MCP supports three transports: stdio (local IPC), HTTP, and SSE (Server-Sent Events). Default configurations for HTTP deployment often have no authentication and no encryption. SSE is vulnerable to DNS rebinding without proper Origin header validation.
Default FastMCP HTTP deployment has no authentication or encryption. Anyone with the server's IP and port can connect agents and invoke tools. HTTP requests are plain text. The fix is simple — use Streamable HTTP with TLS — but many deployments use defaults.
Attackers publish "Trojanized" MCP servers to public registries with names similar to legitimate servers. Once installed, these servers can backdoor tool calls, exfiltrate data, or execute arbitrary code. This is the npm typosquatting problem applied to AI tool infrastructure.
Security researchers documented fake MCP server packages published to npm registries — including a postmark-mcp package that impersonated a legitimate Postmark MCP server. Kaspersky separately documented malicious PyPI packages disguised as MCP development tools. Once installed, these packages contained backdoors for data exfiltration and arbitrary code execution.
From my own MCP server builds and from the MCP specification's security guidance. Each control maps to one or more attack vectors above.
Set readOnlyHint, destructiveHint, idempotentHint, and openWorldHint on every tool. These MCP-native annotations signal tool behavior to agents and clients. A tool with destructiveHint: true should trigger human approval.
Vectors 1 (tool poisoning) and 4 (SSRF) — openWorldHint: true flags tools that reach external resources.
Validate all tool inputs server-side with Zod (TypeScript) or Pydantic (Python). LLMs do not reliably respect JSON schema constraints — enforce validation in your server code, not the schema alone. For URL parameters, use allowlists of permitted domains, not blocklists.
Vector 4 (SSRF) — allowlisted destinations prevent the agent from reaching internal networks. Also prevents path traversal attacks — input validation blocks ../../../secret.txt paths.
Pin the exact tool definition that was reviewed and approved. Use content digest verification (SHA256) — if the tool description changes, the client should reject the tool and require re-approval. This prevents rug pulls. Note: digest-pinned tool versioning is a community proposal (SEP-1766) not yet in the MCP spec, but implementable at the client level today.
Vector 2 (rug pull) — silent tool redefinition is impossible when the definition is pinned by content hash.
Use stdio for local MCP servers (no network exposure). Use Streamable HTTP with TLS for remote servers. Validate Origin headers for SSE transport. Never deploy HTTP MCP servers without authentication — the default is insecure.
Vector 5 (transport attacks) — TLS prevents MITM. Auth prevents unauthorized access. Origin validation prevents DNS rebinding.
Isolate MCP servers from each other. A weather tool should not be able to discover or invoke a banking tool. Use separate credential scopes per server. Treat inter-server messages as untrusted.
Vector 3 (cross-server exfiltration) — isolation breaks the tool combination paths that enable cross-server attacks.
Verify MCP server packages before installation. Check publisher identity, package name spelling, download counts, and source code. Use lock files to pin exact versions. Scan for known malicious packages.
Vector 6 (impersonation/supply chain) — verification catches typosquatted packages before installation.
MCP security is the tool-layer defense in the Agentic AI Kill Chain. These controls work alongside hook-based guardrails (agent-layer defense) to create defense in depth.
More builder security, red-team frameworks, and detection patterns coming.
This work represents the author's independent research and personal views. It is not related to or endorsed by the author's employer.