Garak vs Promptfoo (2026): LLM Security Testing

Garak is a dedicated LLM vulnerability scanner from NVIDIA’s AI Red Team with 120+ probe modules; Promptfoo covers 50+ vulnerability types plus evaluation and guardrails.
Garak tests model endpoints only; Promptfoo tests entire applications including RAG pipelines, agent architectures, and tool-use workflows.
Garak is fully free under Apache 2.0 with no paid tier; Promptfoo’s CLI is MIT-licensed with a commercial tier adding SOC2, ISO 27001, and team features.
Promptfoo maps findings to OWASP, NIST RMF, MITRE ATLAS, and EU AI Act compliance frameworks; Garak produces JSONL and HTML reports without compliance mapping.
Garak supports 23 model backends (including NIM, NVCF, Watsonx); Promptfoo uses YAML-based configuration with built-in CI/CD integration via npm/pip.

Garak is NVIDIA’s open-source LLM vulnerability scanner with 120+ probe types for automated red-teaming. Promptfoo is a developer-focused LLM testing framework that combines red-teaming with eval workflows and compliance mapping.

Which Is Better: Garak or Promptfoo?#

Garak and Promptfoo both test LLMs for security vulnerabilities, but they approach the problem from different angles. Both fit into the broader LLM red teaming category in AI security tools . Both are defensive scanners; for the autonomous offensive side, see my analysis of AI pentesting agents .

Garak is a dedicated vulnerability scanner from NVIDIA’s AI Red Team — 120+ probe modules at last public count, 23 model backends, CLI-first, purely focused on adversarial security testing. Probe and backend counts move month to month, so verify against github.com/NVIDIA/garak before pinning.

Promptfoo is a broader testing platform that combines red teaming (50+ vulnerability types as last published) with LLM evaluation, prompt comparison, guardrails, and CI/CD integration.

Pick Garak when you need a focused probe library for model-level security assessment. Pick Promptfoo when you need red teaming as part of a larger LLM development and testing workflow.

If you are weighing a third option, PyRIT is Microsoft’s enterprise red-teaming orchestrator with strong Azure integration and human-in-the-loop campaigns — the better fit when long, manual red-team engagements outrank a fast CLI scan.

Side-by-side comparison of Garak and Promptfoo showing Garak as NVIDIA model-level prober with 120 attack modules versus Promptfoo as application-level tester covering RAG pipelines, agents, and compliance mapping

Promptfoo red team configuration UI showing the Application Details form where testers describe the target LLM's purpose, user persona, and external system access, illustrating Promptfoo's application-aware testing approach

What Are the Key Differences?#

Feature	Garak	Promptfoo
License	Apache 2.0 (fully free)	MIT (free) + commercial tier
GitHub Stars	8,100+	22,000+
Maintained By	NVIDIA AI Red Team	Promptfoo (venture-backed)
Primary Focus	LLM vulnerability scanning	LLM evaluation + red teaming
Attack Modules	120+ probe types	50+ vulnerability types
Attack Strategy	Pre-built research-backed probes	Context-aware generated attacks
Multi-Turn Testing	Limited	Yes (agent escalation, memory poisoning)
RAG Pipeline Testing	No	Yes
Agent/Tool Testing	No	Yes (tool misuse, privilege escalation)
Model Providers	23 backends	OpenAI, Anthropic, Azure, Google, Bedrock, Ollama, HF, custom
LLM Evaluation	No	Yes (assertions, rubrics, side-by-side comparison)
Guardrails	No	Yes (real-time protection)
Code Scanning	No	Yes (IDE + CI/CD)
Configuration	CLI flags + JSON configs	YAML-based (promptfooconfig.yaml)
Output Formats	JSONL, HTML, hit logs	Web UI, JSON, CSV
Compliance Mapping	No	OWASP, NIST RMF, MITRE ATLAS, EU AI Act
CI/CD Integration	Basic (CLI-based)	Built-in (npm/pip, GitHub Actions)
Language	Python	TypeScript (96.6%)
Enterprise Features	None (pure open-source)	SOC2, ISO 27001, HIPAA, team collaboration

Garak vs Promptfoo: How Do They Compare?#

Attack Libraries and Coverage#

Garak ships with 120+ probe modules organized by attack technique. Each module implements a specific category of adversarial testing: promptinject for prompt injection, dan for DAN jailbreak variants, encoding for encoding-based bypasses (Base64, ROT-13), leakreplay for training data extraction, packagehallucination for fake package suggestions, malwaregen for malware generation attempts, xss for cross-site scripting via LLM output, and tap for Tree of Attack with Pruning.

These are research-backed attacks drawn from academic literature and real-world adversarial techniques.

Garak’s probes are largely static — they use pre-defined attack patterns that have been documented in security research. The atkgen module adds automated attack generation, and buffs can modify probe behavior, but the primary workflow is running established attack playbooks against a target model.

Promptfoo covers 50+ vulnerability types including prompt injection, jailbreaks, PII leaks, tool misuse, privilege escalation, and toxic content generation. The distinction is in how attacks are generated.

Promptfoo creates context-aware adversarial inputs tuned to your specific application — you describe what your application does, and the tool generates attacks relevant to that use case.

This means attacks adapt to your system prompt, expected behavior, and tool configuration rather than applying generic patterns.

For known, documented attack techniques (encoding tricks, DAN variants, training data extraction), Garak’s probe library is more comprehensive. For application-specific vulnerability discovery, Promptfoo’s generated attacks are more targeted.

Application-Layer vs Model-Layer Testing#

This is the most important distinction between the two tools. Garak tests models. Promptfoo tests applications.

Garak sends adversarial inputs to a model endpoint and analyzes the response. It works at the model layer — testing what the model will say when given a crafted prompt.

It does not understand what sits between the user and the model.

If you have a RAG pipeline that retrieves context from a vector database, Garak will not test whether an attacker can poison that context.

If you have an agent that calls external tools, Garak will not test whether the agent can be manipulated into making unauthorized API calls.

Promptfoo treats the entire application as the attack surface. The red team module tests RAG pipelines for indirect prompt injection via retrieved documents.

It tests agent architectures for multi-turn escalation, memory poisoning, and tool parameter tampering. It tests API integrations for data exfiltration paths.

This broader scope catches vulnerabilities that exist in the application layer rather than the model itself.

If your LLM deployment is a direct API call to a model with a system prompt, both tools work. If your deployment includes retrieval augmentation, tool use, or multi-step agent workflows, Promptfoo covers attack surfaces that Garak does not reach.

Feature grid comparing LLM testing scope showing prompt injection and data extraction covered by both tools while RAG poisoning and agent misuse testing are Promptfoo-only capabilities

Reporting and Compliance#

Garak produces three output types per scan: a debug log, a JSONL report with per-attempt records, and a hit log with vulnerability findings only. Version 0.14.0 added redesigned HTML reports.

The output is functional but geared toward security researchers who will process results programmatically. Garak can push findings to the AI Vulnerability Database and integrate with NeMo Guardrails for guardrail effectiveness comparison.

Promptfoo generates reports that map to compliance frameworks: OWASP Top 10 for LLM Applications, NIST AI Risk Management Framework, MITRE ATLAS, and the EU AI Act.

The web UI shows vulnerability categories with severity breakdowns and risk scores.

For teams that need to produce audit-ready documentation or demonstrate compliance with AI governance requirements, Promptfoo’s reporting is significantly more mature.

The commercial tier adds team collaboration, shared dashboards, and enterprise compliance certifications (SOC2, ISO 27001, HIPAA).

Developer Experience and Integration#

Garak is Python-native. Install with pip install garak, export your model API key, run garak --target_type openai --target_name gpt-4 --probes encoding.

The CLI is the primary (and effectively only) interface. Configuration can be done through CLI flags or JSON config files. It fits naturally into Python-based ML workflows.

Promptfoo is TypeScript-native but installable via npm, pip, or Homebrew. Configuration lives in YAML files (promptfooconfig.yaml), which makes it easy to version control alongside application code.

The interactive setup wizard (npx promptfoo@latest redteam setup) walks through application description, target configuration, and plugin selection. The web UI (npx promptfoo view) provides a visual interface for reviewing results.

For CI/CD integration, Promptfoo has a more developed story — npm package, GitHub Actions support, and YAML config that slots into existing pipeline definitions. Garak can run in pipelines too, but the integration is more manual.

OWASP LLM Top 10 and MITRE ATLAS coverage#

Both tools cover OWASP Top 10 for LLM Applications but with different emphasis. Garak’s promptinject, dan, encoding, and tap probes line up against LLM01 Prompt Injection ; leakreplay and xss map to LLM02 Sensitive Information Disclosure and LLM05 Improper Output Handling ; packagehallucination and malwaregen map to LLM09 Misinformation . Promptfoo’s plugin set covers the same OWASP LLM01–LLM10 spread plus tool-misuse and privilege-escalation tests that map to OWASP Agentic AI Threats T6 Excessive Agency and T7 Tool Misuse.

For MITRE ATLAS alignment, Promptfoo ships explicit ATLAS reporting in its compliance views (mapping findings onto AML.T0051 LLM Prompt Injection, AML.T0054 LLM Jailbreak, and related techniques), while Garak produces the raw evidence — JSONL probe results — that can be mapped onto ATLAS during analysis but is not labelled with the framework codes out of the box. NIST AI RMF and EU AI Act mapping follow the same pattern: Promptfoo bakes the labels into reports; Garak leaves the mapping to the analyst.

Agentic AI and RAG test coverage#

Application-layer testing is where the gap is widest. Promptfoo treats RAG pipelines , multi-step agents , and tool-using workflows as first-class targets — indirect prompt injection via retrieved documents, memory poisoning across turns, tool-parameter tampering, and privilege escalation through chained tool calls all have dedicated plugins. Garak’s probes target the model endpoint and do not natively traverse a RAG retriever or tool router; teams running Garak against an agent typically scan the underlying model only and pair it with a separate agentic-AI scanner. For an AI security audit that needs both layers, Garak handles the model probe layer and Promptfoo (or PyRIT ) handles the agent layer.

Hello-world: first scan in 5 minutes#

Both tools install in a single command and produce a first scan against an OpenAI endpoint within minutes.

Garak (Python, pip):

pip install garak
export OPENAI_API_KEY="sk-..."
garak --model_type openai --model_name gpt-4o-mini --probes encoding

This runs the encoding probe family (Base64, ROT-13, hex, Morse) against the target model and writes a JSONL report plus an HTML summary to the working directory. Swap --probes encoding for --probes all to run the full 120+ probe library or --probes promptinject,dan to focus on prompt injection and DAN jailbreak variants.

Promptfoo (Node or Python, npm/pip):

npx promptfoo@latest redteam setup
# answer the wizard, then:
npx promptfoo redteam run
npx promptfoo view

The setup wizard captures the application description and target endpoint into promptfooconfig.yaml, generates context-aware adversarial inputs, and renders results in a local web UI at port 15500. For Python-first stacks, swap npx for pip install promptfoo && promptfoo redteam .... The YAML config slots straight into a GitHub Actions step or any other CI runner, which is the dominant Promptfoo deployment pattern in production teams.

For a fuller walkthrough on Garak, see the Garak tool page; for Promptfoo configuration patterns and alternative scanners, see Promptfoo .

When Should You Choose Garak?#

Choose Garak if:

You need a dedicated vulnerability scanner with a deep library of research-backed attack probes
Your testing focuses on model-level security (what the model responds to adversarial inputs)
You want a fully free, open-source tool with no commercial tier or feature restrictions
Your team works primarily in Python and wants a CLI-first workflow
You need to test across many model providers (23 supported backends)
Academic rigor matters — Garak has a published research paper and is maintained by NVIDIA’s AI Red Team

When Should You Choose Promptfoo?#

Choose Promptfoo if:

You need red teaming as part of a broader LLM testing workflow (evaluation, prompt comparison, guardrails)
Your deployment includes RAG pipelines, agent architectures, or tool integrations that need application-layer testing
Compliance mapping to OWASP, NIST, MITRE ATLAS, or the EU AI Act is a requirement
CI/CD integration with YAML configuration is important for your development workflow
You want both open-source flexibility and the option to upgrade to enterprise features (SOC2, team collaboration)
Context-aware attack generation tuned to your specific application is more valuable than generic probe libraries

For teams with thorough security testing requirements, running both tools provides complementary coverage — Garak for deep model-layer probing and Promptfoo for application-layer red teaming and CI/CD integration.

For more options, browse the AI security tools category.

Frequently Asked Questions

Is Garak or Promptfoo better for LLM red teaming?

It depends on what you are testing. Garak is a dedicated vulnerability scanner with 120+ probe modules focused on known attack techniques — DAN jailbreaks, encoding bypasses, training data extraction, and more. Promptfoo covers 50+ vulnerability types but also includes evaluation, prompt comparison, and guardrails. If you only need adversarial security scanning, Garak goes deeper on attack patterns. If you need red teaming alongside broader LLM testing, Promptfoo covers more ground.

Are both tools free and open-source?

Garak is fully free and open-source under Apache 2.0 with no paid tiers. Promptfoo’s core CLI is free and open-source under MIT. Promptfoo also offers a commercial tier with enterprise features including SOC2 compliance, ISO 27001 certification, and team collaboration tools.

Can Garak test RAG pipelines and agent architectures?

Garak focuses on testing model endpoints — it sends adversarial inputs to a model API and analyzes responses. It does not natively test RAG pipelines, agent tool use, or multi-turn agent workflows. Promptfoo can test RAG pipelines and agent architectures because it treats the entire application (not just the model) as the attack surface.

Which tool supports more LLM providers?

Garak supports 23 generator backends including OpenAI, Hugging Face, AWS Bedrock, Cohere, Groq, LiteLLM, Ollama, and custom REST endpoints. Promptfoo supports OpenAI, Anthropic, Azure OpenAI, Google, AWS Bedrock, Ollama, Hugging Face, and custom endpoints. Coverage is broadly similar, though Garak has more specialized backends like NIM, NVCF, and Watsonx.

Do I need both Garak and Promptfoo?

Running both can make sense for thorough coverage. Garak’s pre-built probe library covers research-backed attack techniques that Promptfoo may not replicate exactly. Promptfoo’s context-aware attack generation and RAG/agent testing cover application-layer vulnerabilities that Garak does not reach. For maximum coverage, use Garak for model-level probing and Promptfoo for application-level red teaming.

Written & maintained by

Suphi Cankurt

Eight years on the vendor side of application-security sales — thousands of evaluations and demos. I started AppSec Santa in 2022 to put that insider view to work for buyers. Independent of any vendor, paid by none, and honest about what fits whom.

Methodology Contact LinkedIn