I found this file full of regular expressions: https://github.com/NineSunsInc/mighty-security/blob/28666b36...
And this with prompts: https://github.com/NineSunsInc/mighty-security/blob/89e4b319...
Are you running any other tests that I missed?
Yes we are using regex as seems like the industry practice. I have DM'd you on X as masterfung btw to chat further.
> Would love feedback - what MCP security issues have you seen?
For me the number one problem with MCP security is the lethal trifecta - the fact that it's so easy to combine MCPs from different vendors (or even from the same vectors) that provide exposure to potentially untrusted/malicious instructions in a way that can then trigger exfiltration of private data.
https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
https://simonwillison.net/2025/Aug/9/bay-area-ai/
I don't know how we can solve this with more technology - it seems to me to be baked into the very concept of how MCP works.
I'm going to pick a fight on this one; I think you know I'm a fan, so take this in the spirit I intend†.
My contention is that "lethal trifecta" is the AI equivalent of self-XSS. It's not apparent yet, because all this stuff is just months old, but a year from now we'll be floored by the fact that people just aimed Cursor or Claude Code at a prod database.
To my lights, the core security issue with tool/function calling in agents isn't MCP; it's context hygiene. Because people aren't writing their own agents, they're convinced that the single-visible-context-window idiom of tools like Cursor are just how these systems work. But a context is just a list of strings. You can have as any of them in an agent as you want.
Once you've got untrusted data hitting one context window, and sensitive tool calls isolated in another context window, the problem of securing the system isn't much different than it is with a traditional web application; some deterministic code that a human reviewed and pentested mediates between those contexts, transforming untrusted inputs into trustable commands for the sensitive context.
That's not a trivial task, but it's the same task as we do now when, for instance, we need to generate a PDF invoice in an invoicing application. Pentesters find vulnerabilities in those apps! But it's not a news story when it happens, so much.
† More a note for other people who might thing I'm being irritable. :)
I think the core of the whole problem is that if you have an LLM with access to tools and exposure to untrusted input, you should consider the author of that untrusted input to be have total control over the execution of those tools.
MCP is just a widely agreed upon abstraction over hooking an LLM up to some tools.
A significant potion of things people want to do with LLMs and with tools in general involve tasks where a malicious attacker taking control of those tools is a bad situation.
Is that what you mean by context hygiene? That end users need to assume that anything bad in the context can trigger unwanted actions, just like you shouldn't blindly copy and paste terminal commands from a web page into your shell (cough, curl https://.../install.sh | sh) or random chunks of JavaScript into the Firefox devtools console on Facebook.com ?
On the first two paragraphs: we agree. (I just think that's both more obvious and less fundamental to the model than current writing on this suggests).
On the latter two paragraphs: my point is that there's nothing fundamental to the concept of an agent that requires you to mix untrusted content with sensitive tool calls. You can confine untrusted content to its own context window, and confine sensitive tool calls to "sandboxed" context windows; you can feed raw context from both to a third context window to summarize or synthesize; etc.
I think MCP security scanning tools sometimes slightly miss the point when they're marking content that MCP tools could return containing things like 'curl, rm, sh' etc... with blanket high risk ratings.
If we swap "agent" out for "developer" here and think about it:
If a developer saves (or runs) content with a curl / POST / rm command - is that a signal they're doing something dangerous? No.
Likely what actually matters starts along the lines of:
- Did they intend / realise they were running the command?
Was it really them that ran it?
Was it hidden in a larger script they ran without inspecting / scanning first?
Was it made visually clear that they were running it? (e.g. not in the background)
- What is in the arguments of the "dangerous" command?
Does the POST contain known files that contain secrets?
Does it contain high entropy strings?
.... base64 encoded data?
- What is the destination?
Localhost? Internal network? Russia?
- etc
Hey, I've submitted you to two PRs, one to use a supported Python version, another to correct the links in your README and QUICKSTART docs.
I work in this space and I was not able to understand how this project works in a couple minutes. The README feels LLM-generated. I think you're supposed to point this at your MCP server's code and not the server itself, is that right?
Looks like Ramparts which solves these issues and is written in fast RUST instead of python. https://github.com/getjavelin/ramparts
Helped to build this out a little bit. Was really cool to get to play with Cerberus for the first time as well.
I'm really interested in learning more about how devs integrate MCP security into their routine code evals.
I think there's a big opportunity as a space to get tools like this into CI/CD pipelines and workflows.
Happy to answer any questions and happy to hear any feedback!
Thanks for checking it out :)
Appreciate the interest and the first comments man. We like how fast Cerebras is and its importance to making the scanning fast! Yeah we have thought about this being part of dev workflow via Github Actions and locally for the dev environment too. Love to hear what you are building!
this is super interesting! MCP is really exciting in terms of what it can unlock for agent use cases, but still the wild west in terms of security. I was on a panel discussion yesterday where this topic came up, basically how do you trust the use of AI tools when so much is still unknown. I think the the idea of using something open source and tool agnostic is appealing, the landscape is evolving so fast that horizontal solutions like this feel valuable. Although I wish clients, anthropic, cursor, etc would build more protections in too so that we didn't have to spend so much time thinking about this. but they've barely implemented remote mcp support so I think we have a ways to go.
This is definitely valuable. I started paying attention to MCP security vulnerabilities largely because of Defcon. I believe that they largely focused on Agentic Security as a theme this time around.
It's a bit mind blowing how we've simply accepted non-technical people within orgs in particular executing code to "automate their tasks" without the same level of rigor that normal code reviews go through. Definitely think that this is a cultural issue that we must fix.
And these MCP vulnerabilities in particular seem much scarier because almost all MCP tools require an insane amount of permissions.
I know right? I mean the timing is great. I love MCP but cant stand how unsafe it is. I think there are greatness ahead if we are able to fix this security issue. This was made around the idea to be as seamless as possible, as we built a dashboard, drop in a GH project MCP server link, and have a local DB to show what you ran. We have more great things ahead. But give it a try and let us know what you think!