Promptfoo Alternatives: 8 LLM Security & Testing Tools in 2026
Since OpenAI acquired Promptfoo in March 2026, teams have been re-evaluating their LLM security stack. I compared 8 alternatives across red teaming depth, runtime protection, and testing workflow to help you pick the right fit.
Top Promptfoo Alternatives
View all 35 alternatives →AI SOC Agents with Dynamic Reasoning
Security Scanner for LLM Agentic Workflows
AI Agent & MCP Security Platform
Zero-Trust Access Control for AI Agents (YC S25)
OpenTelemetry-based AI observability with open-source Phoenix
IBM's ML security library for adversarial attacks and defenses
- Promptfoo is a CLI-first LLM evaluation and red-teaming framework with over 17,000 GitHub stars, acquired by OpenAI in March 2026 — still free under MIT for the core.
- For pure red teaming, Garak (NVIDIA, ~7k stars, 120+ probe modules) and PyRIT (Microsoft, 3.4k stars, multi-modal) are the strongest open-source alternatives.
- For runtime protection, Lakera Guard (sub-50ms, acquired by Check Point 2025) and LLM Guard (35 scanners, MIT) cover the firewall use case that Promptfoo does not.
- Giskard and DeepTeam are closest to Promptfoo's workflow — Python-first testing frameworks with 40+ vulnerability probes each.
- NeMo Guardrails uses NVIDIA's Colang DSL for programmable dialog policies, a use case Promptfoo's eval-focused design does not cover.
Eight tools cover the ground Promptfoo covers — Garak, PyRIT, Giskard, DeepTeam, Lakera Guard, LLM Guard, NeMo Guardrails, and Prompt Security — but each does it better in a different direction. The right pick depends on which part of Promptfoo’s stack actually matters to you.
Why Look for Promptfoo Alternatives?
Promptfoo is a CLI-first framework for LLM evaluation and red teaming used by more than 300,000 developers and, per the vendor, 127 Fortune 500 companies. It sits in the AI security category alongside tools that scan, test, or block LLM traffic.
In March 2026, OpenAI acquired Promptfoo. The MIT-licensed core remains open-source, with over 17,000 GitHub stars on promptfoo/promptfoo.
For teams running promptfoo eval and promptfoo redteam in CI, very little changed in the short term.

Note: The acquisition shifted the roadmap conversation. Some teams want a vendor-neutral red teaming stack that is not owned by any model provider.
Promptfoo’s core strength is breadth in one binary. The tradeoff is depth in any one dimension.
Key Insight
Promptfoo's core strength is breadth in one binary. The tradeoff is depth in any one dimension — multi-modal attack depth, sub-50ms firewall latency, or programmable dialog flow control each belong to a different specialist.
PyRIT has more research behind multi-modal attack orchestration. Lakera Guard’s sub-50ms inline firewall is purpose-built for production traffic in ways Promptfoo’s guardrail endpoint is not. NeMo Guardrails’ Colang DSL covers programmable dialog flow control, a problem Promptfoo does not even try to solve.
The other common reason teams look around is scope. Promptfoo covers four jobs: evals, red teaming, guardrails, and AI code scanning.
Some teams only want one of those and prefer a specialized tool. A pure red teaming shop picks Garak. A runtime protection buyer picks Lakera or LLM Guard. A testing-as-QA team picks Giskard.
Below I walk through the eight strongest alternatives, grouped by the problem they solve rather than raw feature count. (If you specifically want a Garak vs Promptfoo head-to-head, I have a dedicated comparison for that.)
Top Promptfoo Alternatives
1. Garak
Garak is NVIDIA’s open-source LLM vulnerability scanner, released under Apache 2.0 with around 7k GitHub stars. It is the red teaming tool most often compared directly against Promptfoo — I wrote a dedicated Garak vs Promptfoo breakdown for that matchup.
Where Promptfoo bundles evaluation, red teaming, and guardrails into a single CLI, Garak stays narrow. It runs 120+ probe modules against your model, scores responses with 28 detector types, and outputs JSONL plus HTML reports.

That focus is the selling point. Every probe exists because a specific attack class — prompt injection, jailbreak, data leakage, toxicity, hallucination — needed coverage.
Garak supports 23 generator backends, including OpenAI, Anthropic, Hugging Face, and local models via a plugin architecture. Installation is a single pip install garak and the first scan runs in minutes.
For CI integration, Garak is less polished than Promptfoo. There is no hosted dashboard, no side-by-side prompt comparison, and no eval runner.
But as a dedicated vulnerability scanner maintained by NVIDIA’s security team, Garak is the tool I reach for when I need adversarial depth without the other Promptfoo features.
Best for: Teams that want a focused, vendor-maintained LLM vulnerability scanner. License: Open-source (Apache 2.0). Key difference: Pure red teaming depth with 120+ probe modules and 28 detectors. No evals, no guardrails.
2. PyRIT
PyRIT is Microsoft’s open-source Python Risk Identification Toolkit, built out of the Azure team’s experience red-teaming Bing Chat and Copilot. It has around 3.4k GitHub stars, 117 contributors, and ships under MIT.
PyRIT’s differentiator is multi-modal attacks. Where Promptfoo and Garak focus almost entirely on text prompts, PyRIT runs text, image, audio, and video attacks through a shared orchestrator pipeline.
That matters more every quarter as frontier models add multi-modal inputs that the text-only probe sets do not cover.

The orchestrator design is the other standout. PyRIT ships with single-turn, multi-turn, crescendo, and Tree of Attacks with Pruning (TAP) strategies out of the box.
Crescendo is the attack technique Microsoft’s Red Team published in 2024 where small escalations eventually jailbreak models that refuse single-shot prompts. TAP is the state-of-the-art automated jailbreaking paper — having a production implementation in a supported framework is rare.
Converters transform prompts through Base64, ROT13, leetspeak, homoglyph substitution, and cross-modal conversion to bypass safety filters. Targets include OpenAI, Azure OpenAI, HuggingFace, custom HTTP and WebSocket endpoints, and browser-based targets via Playwright.
The tradeoff is that PyRIT is more of a research framework than a turnkey CLI. You write Python to compose orchestrators, converters, and scorers.
Teams that enjoy the Promptfoo promptfoo redteam one-liner will find PyRIT heavier. Teams whose red team writes real attacks against real targets tend to prefer it.
Best for: AI red teams running multi-modal attacks or advanced orchestration strategies. License: Open-source (MIT). Key difference: Multi-modal attacks (text, image, audio, video) and research-grade orchestrators like crescendo and TAP.
3. Giskard
Giskard is an open-source Python library for testing LLMs, RAG applications, and traditional ML models. It is Apache 2.0, has around 5.2k GitHub stars, and is maintained by a Paris-based team with a commercial Hub on top.
Where Promptfoo approaches LLM testing from a prompt-engineering lens, Giskard comes at it from ML testing. It detects hallucinations, prompt injection, bias, data leakage, and performance issues in one scan.
The security and fairness probes live in the same test suite, which fits teams that already think of LLMs as ML systems rather than as chat interfaces.

Giskard’s autonomous red teaming agents run dynamic multi-turn attacks across 40+ probes, adapting strategies in real time. RAGET, the RAG Evaluation Toolkit, auto-generates test questions from your knowledge base to evaluate retrieval accuracy, context relevance, and hallucination rates.
That RAG focus is hard to find elsewhere.
The open-source SDK is free. Giskard Hub is a commercial add-on with team collaboration, continuous testing, and scheduled scans.
For teams that view Promptfoo primarily as “the LLM testing tool” rather than a red teaming tool, Giskard is the closest apples-to-apples replacement.
Best for: ML and data science teams testing LLMs, RAG, and traditional models in the same framework. License: Freemium (Apache 2.0 core + Giskard Hub commercial). Key difference: Combines security, bias, and performance testing with dedicated RAG evaluation through RAGET.
4. DeepTeam
DeepTeam is an open-source LLM red teaming framework from Confident AI, the team behind DeepEval. It ships under Apache 2.0 with around 1.3k GitHub stars.
DeepTeam’s design is closer to Promptfoo’s redteam command than most tools on this list. You declare your target LLM and vulnerability categories in Python, then deepteam.red_team() simulates attacks.
It covers 40+ vulnerability types and 10+ adversarial attack methods including linear, tree, and crescendo jailbreaking.

Mapping is the practical differentiator. DeepTeam’s vulnerabilities map explicitly to the OWASP Top 10 for LLM Applications and align with the NIST AI Risk Management Framework.
If your compliance or audit team asks which vulnerabilities you tested, DeepTeam gives you the framework-aligned answer out of the box.
Pro tip: DeepTeam needs an attacker LLM to generate adversarial prompts — typically OpenAI or another hosted model. That means API costs for every red team run. See the DeepTeam docs for cost-saving strategies like batching attacks and using smaller attacker models.
Promptfoo has the same dynamic in its red team mode, so it is a wash for teams comparing them. For teams choosing between DeepTeam and a pure-scan tool like Garak, Garak’s self-contained probes avoid the attacker-model spend.
Best for: Teams that want OWASP LLM Top 10 coverage with a simple Python API. License: Open-source (Apache 2.0). Key difference: OWASP Top 10 for LLMs and NIST AI RMF alignment baked into the probe set.
5. Lakera Guard
Lakera Guard is a commercial runtime firewall for LLM traffic. It was acquired by Check Point in 2025 as the anchor of Check Point’s Global Center of Excellence for AI Security, which now includes Lakera Guard, Lakera Red, and the Gandalf prompt-injection game.
Lakera Guard is the answer to a use case Promptfoo does not really own. It sits inline between your application and the LLM provider and blocks prompt injection in real time.
The vendor publishes 98%+ detection and sub-50ms latency across 100+ languages.

You integrate Lakera via a single API endpoint. Input goes through the Guard endpoint first, gets scored, and either passes to the LLM or gets rejected.
The same pattern works on outputs for PII redaction and content filtering. Lakera Red, a separate service, handles pre-production red teaming with custom attack suites, covering the workflow that Promptfoo’s redteam command does on the OSS side.
Lakera is clearly a production protection play, not an eval tool. Teams that need both usually pair Promptfoo or Garak for CI red teaming with Lakera Guard for runtime blocking.
Best for: Enterprises that need production LLM traffic protection with sub-50ms latency. License: Commercial (free tier available). Key difference: Runtime firewall with 98%+ prompt injection detection, sub-50ms latency, 100+ language support.
6. LLM Guard
LLM Guard is the open-source answer to Lakera Guard. It is MIT-licensed, maintained by Protect AI, and has around 2.5k GitHub stars.
The architecture is straightforward: 15 input scanners and 20 output scanners that you wire into your application. Input scanners include a prompt injection detector, token limit enforcement, anonymization, and invisible-text stripping.
Output scanners cover PII leakage, toxic content, factual consistency, and data leakage.

LLM Guard ships as a Python library and as a standalone API server. The API server makes it deployable in front of any LLM provider without pinning you to one.
Protect AI publishes the scanners as individually composable pieces, so you only run the checks you care about.
Compared to Promptfoo, LLM Guard replaces the guardrail side of the stack, not the evaluation side. A common pairing is Promptfoo for CI evals plus LLM Guard inline for runtime protection.
For cost-sensitive teams that want a self-hosted runtime filter without the Lakera subscription, LLM Guard is the default open-source choice.
Best for: Self-hosted teams that want open-source runtime guardrails without vendor lock-in. License: Open-source (MIT). Key difference: 35 composable scanners and a standalone API server that works with any LLM provider.
7. NeMo Guardrails
NeMo Guardrails is NVIDIA’s open-source toolkit for adding programmable guardrails to LLM applications. It is Apache 2.0 with around 5.6k GitHub stars.
NeMo’s differentiator is the Colang domain-specific language. Instead of stacking input and output filters, you declare dialog flows as Colang rules — what the bot is allowed to discuss, how it responds to policy violations, and what fallback flows kick in when the user probes boundaries.
That level of programmable control is unique among the tools on this list.

NeMo supports 5 rail types: input, dialog, retrieval, execution, and output. Dialog and execution rails are the rare ones — most other tools only filter individual requests and responses.
Dialog rails let you guarantee conversational behavior across turns. Execution rails let you gate which tools the LLM is allowed to call.
Model support spans OpenAI, Azure, Anthropic, HuggingFace, and NVIDIA NIM. Integrations cover LangChain, LangGraph, and custom chains.
NeMo also includes jailbreak detection, prompt injection protection, fact-checking against knowledge bases, and hallucination detection with OpenTelemetry tracing built in.
The tradeoff is complexity. Colang is a new thing to learn, and simple use cases feel heavy compared to LLM Guard’s scanner-based API.
For teams whose LLM apps have real dialog requirements — support bots, agentic workflows, regulated industries — the expressive policy layer pays back the learning cost.
Best for: Teams building complex LLM apps with multi-turn dialog policies and tool-use guardrails. License: Open-source (Apache 2.0). Key difference: Colang DSL for declarative dialog flow control, plus 5 rail types including dialog and execution.
8. Prompt Security
Prompt Security was acquired by SentinelOne in August 2025 for approximately $180 million (initially announced at ~$250 million before the transaction closed) and is now part of the Singularity platform for GenAI and agentic AI security. The product itself still ships as a standalone GenAI firewall.

Prompt Security covers a wider surface area than the other runtime tools on this list. Beyond blocking prompt injection and data leakage across 250+ LLM models, it ships a Chrome browser extension that detects shadow AI usage in real time via DOM analysis.
Security teams worried about employees pasting source code into ChatGPT deploy the extension via Intune or MDM and see every GenAI interaction across the organization.

Detection latency is sub-200ms, slower than Lakera Guard’s sub-50ms but still production-safe. The semantic data leakage prevention engine redacts PII, PHI, financial data, and source code before requests leave the enterprise.
Red teaming with custom LLMs is included, covering the pre-production workflow that Promptfoo handles on the OSS side.
Prompt Security is the right alternative when the buying center is a security team rather than an engineering team. If shadow AI discovery and enterprise DLP are on the requirements list, Promptfoo’s developer-centric design does not compete.
For teams that just want engineers to run promptfoo eval in CI, this is overkill.
Best for: Security teams that need shadow AI discovery, DLP, and runtime protection in one platform. License: Commercial (part of SentinelOne Singularity). Key difference: Chrome extension for shadow AI detection and semantic DLP across 250+ LLM models.
Feature Comparison
| Feature | Promptfoo | Garak | PyRIT | Giskard | DeepTeam | Lakera | LLM Guard | NeMo GR | Prompt Sec |
|---|---|---|---|---|---|---|---|---|---|
| License | MIT + commercial | Apache 2.0 | MIT | Apache 2.0 + Hub | Apache 2.0 | Commercial | MIT | Apache 2.0 | Commercial |
| Primary use | Eval + red team | Red team | Red team | Test + red team | Red team | Runtime firewall | Runtime filter | Policy engine | Runtime firewall |
| Red teaming | Yes (50+ vulns) | Yes (50+ probes) | Yes (multi-modal) | Yes (40+ probes) | Yes (40+ vulns) | Lakera Red | No | No | Yes |
| Runtime protection | Guardrails endpoint | No | No | No | No | Core feature | Core feature | Core feature | Core feature |
| Multi-modal attacks | No | No | Yes | No | Limited | No | No | No | No |
| RAG evaluation | Partial | No | No | Yes (RAGET) | No | No | No | Retrieval rail | No |
| OWASP LLM Top 10 | Mapped | Mapped | Mapped | Mapped | Explicit mapping | N/A | Partial | Partial | Mapped |
| Dialog flow control | No | No | No | No | No | No | No | Yes (Colang) | No |
| Shadow AI discovery | No | No | No | No | No | No | No | No | Yes (extension) |
| Maintainer | OpenAI (2026) | NVIDIA | Microsoft | Giskard | Confident AI | Check Point | Protect AI | NVIDIA | SentinelOne |
When to Stay with Promptfoo
Promptfoo is still the right pick in several scenarios:
- You want evals and red teaming in one CLI. No other tool on this list covers both in the same binary with the same developer experience. If
promptfoo evalandpromptfoo redteamare both already in your CI, switching means splitting those workflows across two tools. - You are comparing LLM providers or prompts. Promptfoo’s side-by-side prompt comparison is a first-class workflow. Garak, PyRIT, DeepTeam, and Giskard do not treat prompt A/B evaluation as a core feature the way Promptfoo does.
- Your budget is zero. The Promptfoo core is MIT-licensed and remains free after the OpenAI acquisition. For small teams or individual researchers, the 300,000+ developer user base means community support is strong and tutorials are plentiful.
- You pair it with a runtime firewall. Promptfoo’s weakest spot is inline protection at production traffic volume. Many teams run Promptfoo for pre-production red teaming plus LLM Guard or Lakera Guard for runtime blocking — that combination covers the full lifecycle without ripping out Promptfoo.
- Your stack is already Promptfoo-native. Writing
promptfooconfig.yamlfiles, custom providers, and shared assertions is an investment. If you have meaningful configuration tied to the CLI, the switching cost is real.
For more AppSec Santa AI security comparisons, see the AI security tools category, the LLM red teaming guide, and the what is AI security overview.
Frequently Asked Questions
What is the best open-source alternative to Promptfoo?
Does Promptfoo have a runtime firewall for LLM traffic?
Is Garak better than Promptfoo for red teaming?
Which Promptfoo alternative handles prompt injection at runtime?
Can I replace Promptfoo with a single commercial product?

Founder, AppSec Santa
Years in application security. Reviews and compares 209 AppSec tools across 11 categories to help teams pick the right solution. More about me →