Paper of the Week — Exploring the Emerging Threats of the Agent Skill Ecosystem
76 malicious skills confirmed in 3,984 audited AI agent marketplaces — credential theft, backdoor installation, and data exfiltration found hiding in plain sight.
Exploring the Emerging Threats of the Agent Skill Ecosystem
Beurer-Kellner et al. Published 2026-05-27. arXiv:2605.28588
One sentence summary
A security audit of nearly 4,000 AI agent skills from major marketplaces found 76 confirmed malicious payloads and that 13.4% of all skills contain at least one critical-severity issue.
Why this paper
Agent skill/plugin ecosystems are expanding fast — every major agentic framework now has a marketplace. Yet security vetting remains almost entirely absent, and this is the first systematic audit to quantify how bad the problem actually is.
What they did
The authors scraped and analyzed 3,984 AI agent skills from major skill marketplaces, manually reviewing flagged candidates and running automated scanning for malicious patterns. They categorized threats by severity and attack type, tracing how malicious skills survive publication and remain discoverable.
Key findings
- 76 confirmed malicious payloads found across the corpus — not theoretical, manually verified
- 13.4% of all skills contain at least one critical-severity security issue
- At least 8 malicious skills were confirmed still active and available at time of publication
- Attack types include credential theft, backdoor installation, and data exfiltration — not just prompt injection
- Skills exploiting MCP (Model Context Protocol) context were among the vectors identified
Why it matters for practitioners
If you’re building agents that load skills from external registries — or building a product that lets users install third-party skills — this paper is a direct threat model for your system. The 13.4% critical-issue rate means a randomly sampled skill has roughly a 1-in-7 chance of containing something dangerous, which is a baseline you can’t ignore when designing permission models or sandboxing.
The data exfiltration and backdoor findings are particularly sharp: these aren’t crashes or misbehavior, they’re silent. An agent that silently exfiltrates API keys or installs persistence mechanisms will pass most functional tests cleanly.
What you can use today
- Before ingesting any third-party skill into your agent, treat it like an untrusted binary: sandbox execution, restrict filesystem and network access, and audit what environment variables and credentials are in scope
- Apply allowlisting over blocklisting for tool permissions — skills should declare required capabilities upfront and be denied anything undeclared, similar to mobile app permission models
- If you maintain a skill marketplace or registry, the paper’s threat taxonomy (credential access, backdoor, exfiltration) gives you concrete categories to build automated scanning rules around — static analysis of skill metadata and code before publication is the minimum bar