Garak vs Promptfoo
Quick Verdict
Garak and Promptfoo both test LLMs for security vulnerabilities, but they approach the problem from different angles. Garak is a dedicated vulnerability scanner from NVIDIA’s AI Red Team — 37+ probe modules, 23 model backends, CLI-first, purely focused on adversarial security testing. Promptfoo is a broader testing platform that combines red teaming (50+ vulnerability types) with LLM evaluation, prompt comparison, guardrails, and CI/CD integration. Pick Garak when you need a focused probe library for model-level security assessment. Pick Promptfoo when you need red teaming as part of a larger LLM development and testing workflow.
Feature Comparison
| Feature | Garak | Promptfoo |
|---|---|---|
| License | Apache 2.0 (fully free) | MIT (free) + commercial tier |
| GitHub Stars | 6,900+ | 10,300+ |
| Maintained By | NVIDIA AI Red Team | Promptfoo (venture-backed) |
| Primary Focus | LLM vulnerability scanning | LLM evaluation + red teaming |
| Attack Modules | 37+ probe types | 50+ vulnerability types |
| Attack Strategy | Pre-built research-backed probes | Context-aware generated attacks |
| Multi-Turn Testing | Limited | Yes (agent escalation, memory poisoning) |
| RAG Pipeline Testing | No | Yes |
| Agent/Tool Testing | No | Yes (tool misuse, privilege escalation) |
| Model Providers | 23 backends | OpenAI, Anthropic, Azure, Google, Bedrock, Ollama, HF, custom |
| LLM Evaluation | No | Yes (assertions, rubrics, side-by-side comparison) |
| Guardrails | No | Yes (real-time protection) |
| Code Scanning | No | Yes (IDE + CI/CD) |
| Configuration | CLI flags + JSON configs | YAML-based (promptfooconfig.yaml) |
| Output Formats | JSONL, HTML, hit logs | Web UI, JSON, CSV |
| Compliance Mapping | No | OWASP, NIST RMF, MITRE ATLAS, EU AI Act |
| CI/CD Integration | Basic (CLI-based) | Built-in (npm/pip, GitHub Actions) |
| Language | Python | TypeScript (96.6%) |
| Enterprise Features | None (pure open-source) | SOC2, ISO 27001, HIPAA, team collaboration |
Garak vs Promptfoo: Head-to-Head
Attack Libraries and Coverage
Garak ships with 37+ probe modules organized by attack technique. Each module implements a specific category of adversarial testing: promptinject for prompt injection, dan for DAN jailbreak variants, encoding for encoding-based bypasses (Base64, ROT-13), leakreplay for training data extraction, packagehallucination for fake package suggestions, malwaregen for malware generation attempts, xss for cross-site scripting via LLM output, and tap for Tree of Attack with Pruning. These are research-backed attacks drawn from academic literature and real-world adversarial techniques.
Garak’s probes are largely static — they use pre-defined attack patterns that have been documented in security research. The atkgen module adds automated attack generation, and buffs can modify probe behavior, but the primary workflow is running established attack playbooks against a target model.
Promptfoo covers 50+ vulnerability types including prompt injection, jailbreaks, PII leaks, tool misuse, privilege escalation, and toxic content generation. The distinction is in how attacks are generated. Promptfoo creates context-aware adversarial inputs tuned to your specific application — you describe what your application does, and the tool generates attacks relevant to that use case. This means attacks adapt to your system prompt, expected behavior, and tool configuration rather than applying generic patterns.
For known, documented attack techniques (encoding tricks, DAN variants, training data extraction), Garak’s probe library is more comprehensive. For application-specific vulnerability discovery, Promptfoo’s generated attacks are more targeted.
Application-Layer vs Model-Layer Testing
This is the most important distinction between the two tools. Garak tests models. Promptfoo tests applications.
Garak sends adversarial inputs to a model endpoint and analyzes the response. It works at the model layer — testing what the model will say when given a crafted prompt. It does not understand what sits between the user and the model. If you have a RAG pipeline that retrieves context from a vector database, Garak will not test whether an attacker can poison that context. If you have an agent that calls external tools, Garak will not test whether the agent can be manipulated into making unauthorized API calls.
Promptfoo treats the entire application as the attack surface. The red team module tests RAG pipelines for indirect prompt injection via retrieved documents. It tests agent architectures for multi-turn escalation, memory poisoning, and tool parameter tampering. It tests API integrations for data exfiltration paths. This broader scope catches vulnerabilities that exist in the application layer rather than the model itself.
If your LLM deployment is a direct API call to a model with a system prompt, both tools work. If your deployment includes retrieval augmentation, tool use, or multi-step agent workflows, Promptfoo covers attack surfaces that Garak does not reach.
Reporting and Compliance
Garak produces three output types per scan: a debug log, a JSONL report with per-attempt records, and a hit log with vulnerability findings only. Version 0.14.0 added redesigned HTML reports. The output is functional but geared toward security researchers who will process results programmatically. Garak can push findings to the AI Vulnerability Database and integrate with NeMo Guardrails for guardrail effectiveness comparison.
Promptfoo generates reports that map to compliance frameworks: OWASP Top 10 for LLM Applications, NIST AI Risk Management Framework, MITRE ATLAS, and the EU AI Act. The web UI shows vulnerability categories with severity breakdowns and risk scores. For teams that need to produce audit-ready documentation or demonstrate compliance with AI governance requirements, Promptfoo’s reporting is significantly more mature.
The commercial tier adds team collaboration, shared dashboards, and enterprise compliance certifications (SOC2, ISO 27001, HIPAA).
Developer Experience and Integration
Garak is Python-native. Install with pip install garak, export your model API key, run garak --target_type openai --target_name gpt-4 --probes encoding. The CLI is the primary (and effectively only) interface. Configuration can be done through CLI flags or JSON config files. It fits naturally into Python-based ML workflows.
Promptfoo is TypeScript-native but installable via npm, pip, or Homebrew. Configuration lives in YAML files (promptfooconfig.yaml), which makes it easy to version control alongside application code. The interactive setup wizard (npx promptfoo@latest redteam setup) walks through application description, target configuration, and plugin selection. The web UI (npx promptfoo view) provides a visual interface for reviewing results.
For CI/CD integration, Promptfoo has a more developed story — npm package, GitHub Actions support, and YAML config that slots into existing pipeline definitions. Garak can run in pipelines too, but the integration is more manual.
When to Choose Garak
Choose Garak if:
- You need a dedicated vulnerability scanner with a deep library of research-backed attack probes
- Your testing focuses on model-level security (what the model responds to adversarial inputs)
- You want a fully free, open-source tool with no commercial tier or feature restrictions
- Your team works primarily in Python and wants a CLI-first workflow
- You need to test across many model providers (23 supported backends)
- Academic rigor matters — Garak has a published research paper and is maintained by NVIDIA’s AI Red Team
When to Choose Promptfoo
Choose Promptfoo if:
- You need red teaming as part of a broader LLM testing workflow (evaluation, prompt comparison, guardrails)
- Your deployment includes RAG pipelines, agent architectures, or tool integrations that need application-layer testing
- Compliance mapping to OWASP, NIST, MITRE ATLAS, or the EU AI Act is a requirement
- CI/CD integration with YAML configuration is important for your development workflow
- You want both open-source flexibility and the option to upgrade to enterprise features (SOC2, team collaboration)
- Context-aware attack generation tuned to your specific application is more valuable than generic probe libraries
For teams with thorough security testing requirements, running both tools provides complementary coverage — Garak for deep model-layer probing and Promptfoo for application-layer red teaming and CI/CD integration.
For more options, browse our AI security tools category.
Frequently Asked Questions
Is Garak or Promptfoo better for LLM red teaming?
Are both tools free and open-source?
Can Garak test RAG pipelines and agent architectures?
Which tool supports more LLM providers?
Do I need both Garak and Promptfoo?

Suphi Cankurt is an application security enthusiast based in Helsinki, Finland. He reviews and compares 129 AppSec tools across 10 categories on AppSec Santa. Learn more.
Comments
Powered by Giscus — comments are stored in GitHub Discussions.