- Garak is a dedicated LLM vulnerability scanner from NVIDIA's AI Red Team with 120+ probe modules; Promptfoo covers 50+ vulnerability types plus evaluation and guardrails.
- Garak tests model endpoints only; Promptfoo tests entire applications including RAG pipelines, agent architectures, and tool-use workflows.
- Garak is fully free under Apache 2.0 with no paid tier; Promptfoo's CLI is MIT-licensed with a commercial tier adding SOC2, ISO 27001, and team features.
- Promptfoo maps findings to OWASP, NIST RMF, MITRE ATLAS, and EU AI Act compliance frameworks; Garak produces JSONL and HTML reports without compliance mapping.
- Garak supports 23 model backends (including NIM, NVCF, Watsonx); Promptfoo uses YAML-based configuration with built-in CI/CD integration via npm/pip.
Garak is NVIDIA’s open-source LLM vulnerability scanner with 120+ probe types for automated red-teaming. Promptfoo is a developer-focused LLM testing framework that combines red-teaming with eval workflows and compliance mapping.
Which Is Better: Garak or Promptfoo?
Garak and Promptfoo both test LLMs for security vulnerabilities, but they approach the problem from different angles. Both fit into the broader LLM red teaming category in AI security tools.
Garak is a dedicated vulnerability scanner from NVIDIA’s AI Red Team โ 120+ probe modules at last public count, 23 model backends, CLI-first, purely focused on adversarial security testing. Probe and backend counts move month to month, so verify against github.com/NVIDIA/garak before pinning.
Promptfoo is a broader testing platform that combines red teaming (50+ vulnerability types as last published) with LLM evaluation, prompt comparison, guardrails, and CI/CD integration.
Pick Garak when you need a focused probe library for model-level security assessment. Pick Promptfoo when you need red teaming as part of a larger LLM development and testing workflow. If you are weighing a third option, PyRIT is Microsoft’s enterprise red-teaming orchestrator with strong Azure integration and human-in-the-loop campaigns โ the better fit when long, manual red-team engagements outrank a fast CLI scan.


What Are the Key Differences?
| Feature | Garak | Promptfoo |
|---|---|---|
| License | Apache 2.0 (fully free) | MIT (free) + commercial tier |
| GitHub Stars | 4,700+ | 25,000+ |
| Maintained By | NVIDIA AI Red Team | Promptfoo (venture-backed) |
| Primary Focus | LLM vulnerability scanning | LLM evaluation + red teaming |
| Attack Modules | 120+ probe types | 50+ vulnerability types |
| Attack Strategy | Pre-built research-backed probes | Context-aware generated attacks |
| Multi-Turn Testing | Limited | Yes (agent escalation, memory poisoning) |
| RAG Pipeline Testing | No | Yes |
| Agent/Tool Testing | No | Yes (tool misuse, privilege escalation) |
| Model Providers | 23 backends | OpenAI, Anthropic, Azure, Google, Bedrock, Ollama, HF, custom |
| LLM Evaluation | No | Yes (assertions, rubrics, side-by-side comparison) |
| Guardrails | No | Yes (real-time protection) |
| Code Scanning | No | Yes (IDE + CI/CD) |
| Configuration | CLI flags + JSON configs | YAML-based (promptfooconfig.yaml) |
| Output Formats | JSONL, HTML, hit logs | Web UI, JSON, CSV |
| Compliance Mapping | No | OWASP, NIST RMF, MITRE ATLAS, EU AI Act |
| CI/CD Integration | Basic (CLI-based) | Built-in (npm/pip, GitHub Actions) |
| Language | Python | TypeScript (96.6%) |
| Enterprise Features | None (pure open-source) | SOC2, ISO 27001, HIPAA, team collaboration |
Garak vs Promptfoo: How Do They Compare?
Attack Libraries and Coverage
Garak ships with 120+ probe modules organized by attack technique. Each module implements a specific category of adversarial testing: promptinject for prompt injection, dan for DAN jailbreak variants, encoding for encoding-based bypasses (Base64, ROT-13), leakreplay for training data extraction, packagehallucination for fake package suggestions, malwaregen for malware generation attempts, xss for cross-site scripting via LLM output, and tap for Tree of Attack with Pruning.
These are research-backed attacks drawn from academic literature and real-world adversarial techniques.
Garak’s probes are largely static โ they use pre-defined attack patterns that have been documented in security research. The atkgen module adds automated attack generation, and buffs can modify probe behavior, but the primary workflow is running established attack playbooks against a target model.
Promptfoo covers 50+ vulnerability types including prompt injection, jailbreaks, PII leaks, tool misuse, privilege escalation, and toxic content generation. The distinction is in how attacks are generated.
Promptfoo creates context-aware adversarial inputs tuned to your specific application โ you describe what your application does, and the tool generates attacks relevant to that use case.
This means attacks adapt to your system prompt, expected behavior, and tool configuration rather than applying generic patterns.
For known, documented attack techniques (encoding tricks, DAN variants, training data extraction), Garak’s probe library is more comprehensive. For application-specific vulnerability discovery, Promptfoo’s generated attacks are more targeted.
Application-Layer vs Model-Layer Testing
This is the most important distinction between the two tools. Garak tests models. Promptfoo tests applications.
Garak sends adversarial inputs to a model endpoint and analyzes the response. It works at the model layer โ testing what the model will say when given a crafted prompt.
It does not understand what sits between the user and the model.
If you have a RAG pipeline that retrieves context from a vector database, Garak will not test whether an attacker can poison that context.
If you have an agent that calls external tools, Garak will not test whether the agent can be manipulated into making unauthorized API calls.
Promptfoo treats the entire application as the attack surface. The red team module tests RAG pipelines for indirect prompt injection via retrieved documents.
It tests agent architectures for multi-turn escalation, memory poisoning, and tool parameter tampering. It tests API integrations for data exfiltration paths.
This broader scope catches vulnerabilities that exist in the application layer rather than the model itself.
If your LLM deployment is a direct API call to a model with a system prompt, both tools work. If your deployment includes retrieval augmentation, tool use, or multi-step agent workflows, Promptfoo covers attack surfaces that Garak does not reach.

Reporting and Compliance
Garak produces three output types per scan: a debug log, a JSONL report with per-attempt records, and a hit log with vulnerability findings only. Version 0.14.0 added redesigned HTML reports.
The output is functional but geared toward security researchers who will process results programmatically. Garak can push findings to the AI Vulnerability Database and integrate with NeMo Guardrails for guardrail effectiveness comparison.
Promptfoo generates reports that map to compliance frameworks: OWASP Top 10 for LLM Applications, NIST AI Risk Management Framework, MITRE ATLAS, and the EU AI Act.
The web UI shows vulnerability categories with severity breakdowns and risk scores.
For teams that need to produce audit-ready documentation or demonstrate compliance with AI governance requirements, Promptfoo’s reporting is significantly more mature.
The commercial tier adds team collaboration, shared dashboards, and enterprise compliance certifications (SOC2, ISO 27001, HIPAA).
Developer Experience and Integration
Garak is Python-native. Install with pip install garak, export your model API key, run garak --target_type openai --target_name gpt-4 --probes encoding.
The CLI is the primary (and effectively only) interface. Configuration can be done through CLI flags or JSON config files. It fits naturally into Python-based ML workflows.
Promptfoo is TypeScript-native but installable via npm, pip, or Homebrew. Configuration lives in YAML files (promptfooconfig.yaml), which makes it easy to version control alongside application code.
The interactive setup wizard (npx promptfoo@latest redteam setup) walks through application description, target configuration, and plugin selection. The web UI (npx promptfoo view) provides a visual interface for reviewing results.
For CI/CD integration, Promptfoo has a more developed story โ npm package, GitHub Actions support, and YAML config that slots into existing pipeline definitions. Garak can run in pipelines too, but the integration is more manual.
OWASP LLM Top 10 and MITRE ATLAS coverage
Both tools cover OWASP Top 10 for LLM Applications but with different emphasis. Garak’s promptinject, dan, encoding, and tap probes line up against LLM01 Prompt Injection; leakreplay and xss map to LLM02 Sensitive Information Disclosure and LLM05 Improper Output Handling; packagehallucination and malwaregen map to LLM09 Misinformation. Promptfoo’s plugin set covers the same OWASP LLM01โLLM10 spread plus tool-misuse and privilege-escalation tests that map to OWASP Agentic AI Threats T6 Excessive Agency and T7 Tool Misuse.
For MITRE ATLAS alignment, Promptfoo ships explicit ATLAS reporting in its compliance views (mapping findings onto AML.T0051 LLM Prompt Injection, AML.T0054 LLM Jailbreak, and related techniques), while Garak produces the raw evidence โ JSONL probe results โ that can be mapped onto ATLAS during analysis but is not labelled with the framework codes out of the box. NIST AI RMF and EU AI Act mapping follow the same pattern: Promptfoo bakes the labels into reports; Garak leaves the mapping to the analyst.
Agentic AI and RAG test coverage
Application-layer testing is where the gap is widest. Promptfoo treats RAG pipelines, multi-step agents, and tool-using workflows as first-class targets โ indirect prompt injection via retrieved documents, memory poisoning across turns, tool-parameter tampering, and privilege escalation through chained tool calls all have dedicated plugins. Garak’s probes target the model endpoint and do not natively traverse a RAG retriever or tool router; teams running Garak against an agent typically scan the underlying model only and pair it with a separate agentic-AI scanner. For an AI security audit that needs both layers, Garak handles the model probe layer and Promptfoo (or PyRIT) handles the agent layer.
Hello-world: first scan in 5 minutes
Both tools install in a single command and produce a first scan against an OpenAI endpoint within minutes.
Garak (Python, pip):
pip install garak
export OPENAI_API_KEY="sk-..."
garak --model_type openai --model_name gpt-4o-mini --probes encoding
This runs the encoding probe family (Base64, ROT-13, hex, Morse) against the target model and writes a JSONL report plus an HTML summary to the working directory. Swap --probes encoding for --probes all to run the full 120+ probe library or --probes promptinject,dan to focus on prompt injection and DAN jailbreak variants.
Promptfoo (Node or Python, npm/pip):
npx promptfoo@latest redteam setup
# answer the wizard, then:
npx promptfoo redteam run
npx promptfoo view
The setup wizard captures the application description and target endpoint into promptfooconfig.yaml, generates context-aware adversarial inputs, and renders results in a local web UI at port 15500. For Python-first stacks, swap npx for pip install promptfoo && promptfoo redteam .... The YAML config slots straight into a GitHub Actions step or any other CI runner, which is the dominant Promptfoo deployment pattern in production teams.
For a fuller walkthrough on Garak, see the Garak tool page; for Promptfoo configuration patterns and alternative scanners, see Promptfoo.
When Should You Choose Garak?
Choose Garak if:
- You need a dedicated vulnerability scanner with a deep library of research-backed attack probes
- Your testing focuses on model-level security (what the model responds to adversarial inputs)
- You want a fully free, open-source tool with no commercial tier or feature restrictions
- Your team works primarily in Python and wants a CLI-first workflow
- You need to test across many model providers (23 supported backends)
- Academic rigor matters โ Garak has a published research paper and is maintained by NVIDIA’s AI Red Team
When Should You Choose Promptfoo?
Choose Promptfoo if:
- You need red teaming as part of a broader LLM testing workflow (evaluation, prompt comparison, guardrails)
- Your deployment includes RAG pipelines, agent architectures, or tool integrations that need application-layer testing
- Compliance mapping to OWASP, NIST, MITRE ATLAS, or the EU AI Act is a requirement
- CI/CD integration with YAML configuration is important for your development workflow
- You want both open-source flexibility and the option to upgrade to enterprise features (SOC2, team collaboration)
- Context-aware attack generation tuned to your specific application is more valuable than generic probe libraries
For teams with thorough security testing requirements, running both tools provides complementary coverage โ Garak for deep model-layer probing and Promptfoo for application-layer red teaming and CI/CD integration.
For more options, browse the AI security tools category.
Frequently Asked Questions
Is Garak or Promptfoo better for LLM red teaming?
Are both tools free and open-source?
Can Garak test RAG pipelines and agent architectures?
Which tool supports more LLM providers?
Do I need both Garak and Promptfoo?

Founder, AppSec Santa
Years in application security. Reviews and compares 215 AppSec tools across 12 categories to help teams pick the right solution. More about me →
