
AppSec Santa Weekly tracks new AppSec tools and the latest releases from 290+ existing ones. Each issue covers what shipped, what changed, and why it matters.
This week, the AI tooling layer cracked in three places at once. Anthropic’s MCP got a 10-CVE protocol-level RCE. Endor published a benchmark showing AI coders ship vulnerable code 83% of the time. And NIST quietly stepped out of the CVE-enrichment business.
Busy week on the vendor side too, with Nuclei shipping two patch-now security fixes, Trivy rotating GPG keys, and Ostorlab renaming its own project mid-release.
This Week at a Glance
45 releases across 9 active categories.
- AI Security (15) — Arize Phoenix 5 releases, Akto 5, Arthur AI 2 versions, Promptfoo 0.121.5, Vectara
- IaC Security (10) — Trivy v0.70.0, Checkov 3 patches, Conftest v0.68.1/2, Kubescape v4.0.4/5, KubeArmor v1.6.17/18
- SCA (8) — Renovate 5 releases, Dependabot v0.370, SCANOSS v1.52.1, Chainguard apko v1.2.4
- SAST (5) — Semgrep v1.160.0, PHPStan 3 patches in 3 days, CodeQL v2.25.2
- Mobile (3) — Ostorlab v2.0.0 (project rename), Apktool v3.0.2, Ostorlab v1.21
- DAST (1) — Nuclei v3.8.0 with two CVE fixes
- IAST (1) — Datadog Code Security v1.61.1
- ASPM (1) — DefectDojo 2.57.2
- Container Security (1) — Calico v3.31.5
Quiet this week: RASP, API Security
But the real story this week was vulnerabilities — in the tools meant to stop them, and in the code AI agents ship. Read on.
New on the Radar
MCP ships an architectural RCE across every official SDK. On April 15, OX Security disclosed a protocol-level command-execution vulnerability in Anthropic’s Model Context Protocol. It spans the official Python, TypeScript, Java, and Rust SDKs.
Ten CVEs were issued. The biggest three:
- CVE-2026-30623 — LiteLLM
- CVE-2026-30624 — Agent Zero
- CVE-2026-30615 — Windsurf, zero-click
The flaw sits in how STDIO-based MCP configurations pipe user-influenced input into shell-like command construction. It’s not a coding error in one library — it’s the execution model. OX’s scan counts 150M+ cumulative downloads, 7,000+ publicly reachable servers, up to 200,000 vulnerable instances, and 200+ open-source projects.
The client list reads like an inventory of the modern AI coding stack. Cursor, Windsurf, Claude Code, Gemini-CLI, and VS Code are all affected via prompt-injection into MCP server configs. Framework-side, LiteLLM, LangChain, IBM’s LangFlow, GPT Researcher, Agent Zero, and Bisheng have confirmed RCE paths.
Anthropic’s response, per OX’s disclosure, is that the STDIO execution model is a secure default and sanitization is the developer’s responsibility. No protocol-level fix is planned.
My take: Log4Shell was a library bug. This is a protocol decision. If you run an MCP server that takes inputs from anywhere a prompt touches, treat it as exploitable until you’ve audited the STDIO path yourself. Every AI tooling vendor downstream of the SDKs now has to ship their own sanitization — which is the fragmentation pattern that made shell injection a 30-year problem.
Endor’s Agent Security League — top AI coder passes 17% of security tests. On the same day, Endor Labs launched a public benchmark of AI coding agents. It’s built on Carnegie Mellon’s SusVibes framework — 200 tasks drawn from 108 real-world Python projects, covering 77 CWE categories.
The headline numbers:
- Best security: Codex with GPT-5.4 — 17.3% security pass rate, 62.6% functional
- Best functional: Cursor with Gemini 3.1 Pro — 73.7% functional, 13.4% security
- Across all agents: 83% of AI-generated code contained at least one security vulnerability
Claude Code, Cursor, Codex, and SWE-Agent were all benchmarked.
My take: The number that matters is the gap, not the absolute. AI coders are crossing 70% on functional correctness while still missing 80%+ of security checks. If your org is standardizing on AI-first development, either the security layer is not AI or you’re accepting a known defect rate. Endor’s framing is right — SAST, SCA, and secret scanning need to run inside the agent loop, not after it.
CISA escalates axios into a federal advisory; NIST stops enriching most CVEs. Two things happened to the traditional vuln-management pipeline this week, and they fit together.
On April 20, CISA issued a formal supply-chain alert for the axios npm compromise I’ve tracked since issue #3. Elastic Security Labs published a technical writeup of the second-stage RAT. Google’s Threat Intelligence Group has attributed it to UNC1069 — a financially-motivated North Korea-nexus cluster Mandiant has tracked since 2018, distinct from Lazarus.
OpenAI’s May 8 macOS signing-certificate revocation is still on the clock.
On April 17, Endor Labs and Socket both covered NIST’s decision to stop enriching most incoming CVEs in the NVD. Enrichment — CWE mappings, CVSS vectors, CPE assignments — is what downstream scanners need to match CVEs to packages.
Without it, SCA tools fall back on vendor advisories and GHSAs. The coverage gap becomes the tool’s problem.
My take: AI-assisted vuln research is producing more CVEs than the infrastructure built to classify them can absorb. That’s the thread connecting Mythos Preview two weeks ago to the NIST climbdown this week. Tools that run their own vuln databases (Grype, Snyk, Endor Labs, Socket) just became more important. Tools that lean entirely on NVD just got a problem.
Vercel confirms a breach — via a Vercel employee’s OAuth grant to an AI-productivity tool. Quick version, since TechCrunch has the full timeline:
- February 2026 — a Context.ai employee’s laptop got Lumma Stealer after searching for Roblox cheats
- 10 weeks later — attacker used the stolen OAuth token to reach Vercel’s Google Workspace
- The pivot worked because a Vercel employee had granted Context.ai “Allow All” scopes signing up for its AI office product
- Non-sensitive env vars on a limited customer subset leaked; sensitive-marked vars were encrypted and appear untouched
Vercel changed the default — env vars are now sensitive by default. Wiz’s writeup has the cleanest technical read of the OAuth pivot.
My take: This is the OAuth-app-as-supply-chain pattern that hit Salesloft/Drift last year, except the third-party app is now an AI productivity tool. Every “sign in with Google to try our AI X” flow is this risk if someone clicks Allow All. Audit third-party app grants in your Workspace admin console this week.
Notable Updates
- Nuclei v3.8.0 — Patches GHSA-29rg-wmcw-hpf4 (JS template sandbox escape — read local files outside
allow-local-file-access) and GHSA-jm34-66cf-qpvr (expression evaluation from non-template sources, RCE from malicious templates). If you run Nuclei against untrusted template collections, upgrade today. - Trivy v0.70.0 — GPG signing keys rotated for the deb/rpm repositories. CI pipelines pinning the old key will fail signature verification on next update. Also: ELF-symbol Go binary detection for
-trimpathbuilds, PEP 751 pylock.toml parser, Azure misconfig parity for ARM-based AKS. - Ostorlab v2.0.0 — Project rename. The GitHub repo 301-redirects
Ostorlab/ostorlabtoOstorlab/oxo. Major version bump is the rename plus Python 3.14 floor. If youpip install ostorlabin CI, rebase to the new name. - Semgrep v1.160.0 — Scala parsing moved to tree-sitter with pfff fallback. Pro-engine taint tracking now handles variadic functions — fewer missed sinks in Python and Go code that routes through
log.Printf-style helpers. - PHPStan 2.1.48 / 2.1.49 / 2.1.50 — Three patch releases in three days. The bisect command from 2.1.47 is already paying off — all three were regression fixes (type narrowing, promoted property handling, enum scope typing).
- Checkov 3.2.521 / 3.2.522 / 3.2.524 — Lazy version-cache init (cold-start latency win for short-lived CI containers), a CLI control-byte strip, and a
GoogleKMSKeyIsPubliccrash fix on unhashable types. - Arthur AI 2.1.516 / 2.1.529 — UI redesign on 4/14, compliance-reporting strengthening on 4/17.
The AI Defense Race Caught Up to the AI Offense Race
Three vendor moves this week point the same way. The AI-attacker story the Anthropic Mythos announcement kicked off two weeks ago now has AI-defender launches pacing it.
- OpenAI GPT-5.4-Cyber — A cyber-specialized model available only to vetted members of OpenAI’s Trusted Access for Cyber Defense program. Fewer research restrictions than public models. OpenAI’s answer to Mythos.
- IBM Autonomous Security — Multi-agent, vendor-agnostic AI service that coordinates across SOC, AppSec, and vuln-management stacks. Pitched as a response to agentic-attack velocity.
- Artemis — $70M Series A — Felicis-led, with First Round and Brightmind. Cofounded by Shachar Hirshberg (CEO, ex-AWS) and Dan Shiebler (CTO, ex-head of AI at Abnormal). Customers include Mercury, Wix, and Lemonade.
Pattern to watch: each launch assumes the defender lacks tooling at AI-attacker speed. If Mythos proved attackers can move at model speed, this week is the first wave of vendor answers. Expect Semgrep, Snyk, and Endor to ship their own versions over the next two quarters.
AI Security
15 releases. Three of the fastest-moving vendors in the category are all shipping at arms-race cadence.
Arize AI shipped five Phoenix releases (v14.6.0 through v14.9.1). v14.8 added Mastra instrumentation. v14.9 changed evaluation defaults. With Galileo getting acquired into Splunk last week, Phoenix is the one to watch in AI observability now.
Akto shipped five more releases (v1.98.3 through v1.98.5 plus two guardrails-service patches). Empty changelogs, same pattern as last week — they ship fast, document later.
Promptfoo 0.121.5 is maintenance-only, but stars jumped 321 this week (20,043 → 20,364). Biggest weekly absolute gain in my tracked set. AI red-teaming has momentum.
IaC Security
10 releases, one big one. Trivy v0.70.0 is the release the category got this week — GPG-key rotation is the CI-breaker, but the real tech jump is Go binary version detection through ELF symbol tables for -trimpath builds. That closes a long-standing false-negative hole on stripped binaries in Grafana, HashiCorp, and Calico images.
PEP 751 pylock.toml parsing is early — pip 25.x is starting to emit them. Scanners that stay on requirements.txt / poetry.lock will miss dependencies starting this summer.
Kubescape shipped v4.0.4 and v4.0.5 on the same day. KubeArmor shipped v1.6.17/v1.6.18. Kubernetes-native posture is the category’s busiest quadrant.
SAST
Five releases. Weekly cadence on the leaders, an unusually fast patch run on PHPStan.
Semgrep v1.160.0 moved Scala parsing to tree-sitter with a pfff fallback — the last major language that was still pfff-only. Pro-engine taint now tracks through variadic functions, an under-advertised win for real false negatives in Python and Go.
PHPStan shipped three patch releases in three days (2.1.48 → 2.1.50). Each fixed a specific 2.1-series regression. The bisect command that landed in 2.1.47 is clearly in active use.
CodeQL v2.25.2 updated the Eclipse Temurin OpenJDK build. Nothing category-shifting.
SCA
Eight releases. Renovate pushed five (43.135.0 → 43.136.3) at normal pace. Dependabot core v0.370.0 bumped OpenTelemetry gems. SCANOSS v1.52.1 patched on top of last week’s component-status query. Chainguard apko v1.2.4 is a routine Kubernetes dep bump.
Worth reading if you track axios: Aikido’s debunk of CVE-2026-40175 — tagged critical in the NVD, not actually exploitable in real Node runtimes. With NIST stepping back from enrichment, vendor analyses like this will carry more weight.
Mobile
Three releases. Ostorlab v2.0.0 was the oxo rename. The 1.21 release from 4/14 is the one with actual features (iOS TestFlight asset support). Apktool v3.0.2 was a parsing fix for incompatible tag formats.
DAST
Nuclei v3.8.0 broke the three-week DAST silence I flagged last week. Past the two security fixes covered above, the release added PDF export (long-standing enterprise ask), an XSS reflection-context analyzer that distinguishes HTML / attribute / JS context to cut SPA false positives, and a honeypot detector that deprioritizes honeypot-shaped targets.
Sixteen new contributors landed in this window — a community release as much as a product one.
IAST
Datadog Code Security (IAST) v1.61.1 is a maintenance release for dd-trace-java. Preload fix for Spring Boot JARs, narrow AppSec event-stream change. If you’re running 1.60/1.61 in prod, it’s a safe patch-level bump.
ASPM
DefectDojo 2.57.2 patches on top of last week’s 2.57.1 Wazuh-parser fix. The 2.57 line is stabilizing.
Container Security
Calico v3.31.5 patch release. No security-only content in the notes.
Quiet This Week
No GitHub releases from RASP or API Security.
Deals & Funding
- Artemis — $70M Series A — AI-native SOC, Felicis-led with First Round and Brightmind. Cofounded by Shachar Hirshberg (CEO, ex-AWS) and Dan Shiebler (CTO, ex-head of AI at Abnormal). Customers: Mercury, Wix, Lemonade. Direct competition to Splunk/CrowdStrike on the premise that AI-augmented attackers outrun SOAR-era workflows.
No other AppSec-specific M&A or funding in the window worth flagging.
Star Watch
- mitmproxy 43,204 — +134 WoW. Still the most-starred tool I track in any category.
- Trivy 34,637 — +156 WoW. Biggest absolute add among mature repos this week.
- Nuclei 28,015 — new to my tracked set after v3.8.0 moved it into front-of-mind.
- Harbor 28,322 — +64.
- Promptfoo 20,364 — +321, biggest weekly % gain in my set (+1.6%). Momentum held after last week’s 20K crossing.
- Semgrep 14,870 — new baseline.
- Grype 12,061 — +44.
- Arize Phoenix 9,369 — +101. On pace for 10K in six to seven weeks.
- OpenGrep 2,410 — +35. Steady.
Quick Hits
- Arize Phoenix v14.6 → v14.9.1 (5 releases). Mastra instrumentation in v14.8.
- Akto v1.98.3 → v1.98.5 plus guardrails-service 1.88.8 and 1.89.0. Empty changelogs.
- Renovate 43.135 → 43.136.3 (5 releases). Normal cadence.
- Kubescape v4.0.4 and v4.0.5 on the same day.
- KubeArmor v1.6.17 / v1.6.18.
- SCANOSS v1.52.1.
- Chainguard apko v1.2.4.
- DefectDojo 2.57.2 Wazuh parser follow-up.
- Vectara 2.11.6 retry-with-split fix for 400s.
On AppSec Santa This Week
Updated this week:
- Trivy — Added v0.70.0 with GPG rotation, ELF-symbol Go binary detection, and PEP 751 pylock.toml parsing
- Nuclei — Added v3.8.0 including the two GHSA security fixes
- Semgrep — Added v1.160.0 with tree-sitter Scala and variadic-function taint support
- Ostorlab — Added v2.0.0 and the rename to
oxo - PHPStan — Added 2.1.48 / 2.1.49 / 2.1.50 regression fixes
Most read this week: Semgrep, Snyk, Trivy, SonarQube, Nuclei
If you spot anything outdated or wrong on a tool page, reply to this email — I update pages based on reader feedback.
Worth Reading
OX Security — The Mother of All AI Supply Chains: MCP Systemic RCE — The full disclosure, CVE list, and sandboxed reproductions on LiteLLM, LangChain, and LangFlow. Read it with a copy of your MCP server inventory open.
Endor Labs — Agent Security League — Full methodology and per-agent scores. Paired with the ebook and SusVibes paper. Cite this when someone claims AI-generated code is production-ready by default.
Datadog — The Case for Dependency Cooldowns Post-axios — Kennedy Toomey walks through why a 12-hour publish-to-install cooldown would have blocked the axios compromise. Concrete, practical.
Endor Labs — Critical RCE in protobuf.js (GHSA-xq3m-2v4x-88gg) — CVSS 9.4 RCE where type names in a
.protofile compile through theFunctionconstructor. If you load.protoschemas from untrusted input, that’s an RCE surface.Sysdig — Marimo Weaponized via Hugging Face — 662 exploit events from 11 IPs in 10 countries over three days. Attackers staged payloads on a typosquat Hugging Face Space to drop an NKAbuse variant. First public case of Hugging Face as malware CDN.
Wiz — Context.ai OAuth Token Compromise — Cleanest technical read of the OAuth pivot into Vercel. Start here if you audit third-party app grants in Google Workspace.
Sysdig — 2026 Cloud-Native Security Report — Headlines: 70% of surveyed orgs run behavior-based detection, 140% YoY increase in auto-killing suspicious processes, 25x growth in AI-specific package adoption.
Wrapping Up
That’s issue #6. 45 releases tracked, one protocol-level vulnerability Anthropic won’t patch, one benchmark that puts numbers on how far AI coders still are from secure code, and a federal advisory on a compromise I’ve been writing about for a month.
The through-line is governance. NIST stepped back from enrichment. Anthropic declined to patch MCP. Vercel patched a customer default instead of an upstream OAuth scope.
Each decision makes sense in isolation. Together they add up to a security layer thinner than it was a month ago.
I track 114 GitHub repos and 97 RSS feeds every week. If a tool ships a release, it shows up here, with context on what it means.
If you found this useful, forward it to a colleague who’d benefit.
And if I missed something or got something wrong, just reply — I read every response.
See you next Tuesday.
AppSec Santa Weekly — changelog analysis and category trends from 290+ AppSec tools. Browse all tools or subscribe for weekly updates.