Garak vs Promptfoo (2026): LLM Security Testing Head-to-Head

Quick Verdict

Garak and Promptfoo both test LLMs for security vulnerabilities, but they approach the problem from different angles. Garak is a dedicated vulnerability scanner from NVIDIA’s AI Red Team — 37+ probe modules, 23 model backends, CLI-first, purely focused on adversarial security testing. Promptfoo is a broader testing platform that combines red teaming (50+ vulnerability types) with LLM evaluation, prompt comparison, guardrails, and CI/CD integration. Pick Garak when you need a focused probe library for model-level security assessment. Pick Promptfoo when you need red teaming as part of a larger LLM development and testing workflow.

Feature Comparison

Feature	Garak	Promptfoo
License	Apache 2.0 (fully free)	MIT (free) + commercial tier
GitHub Stars	6,900+	10,300+
Maintained By	NVIDIA AI Red Team	Promptfoo (venture-backed)
Primary Focus	LLM vulnerability scanning	LLM evaluation + red teaming
Attack Modules	37+ probe types	50+ vulnerability types
Attack Strategy	Pre-built research-backed probes	Context-aware generated attacks
Multi-Turn Testing	Limited	Yes (agent escalation, memory poisoning)
RAG Pipeline Testing	No	Yes
Agent/Tool Testing	No	Yes (tool misuse, privilege escalation)
Model Providers	23 backends	OpenAI, Anthropic, Azure, Google, Bedrock, Ollama, HF, custom
LLM Evaluation	No	Yes (assertions, rubrics, side-by-side comparison)
Guardrails	No	Yes (real-time protection)
Code Scanning	No	Yes (IDE + CI/CD)
Configuration	CLI flags + JSON configs	YAML-based (promptfooconfig.yaml)
Output Formats	JSONL, HTML, hit logs	Web UI, JSON, CSV
Compliance Mapping	No	OWASP, NIST RMF, MITRE ATLAS, EU AI Act
CI/CD Integration	Basic (CLI-based)	Built-in (npm/pip, GitHub Actions)
Language	Python	TypeScript (96.6%)
Enterprise Features	None (pure open-source)	SOC2, ISO 27001, HIPAA, team collaboration

Garak vs Promptfoo: Head-to-Head

Attack Libraries and Coverage

Garak ships with 37+ probe modules organized by attack technique. Each module implements a specific category of adversarial testing: promptinject for prompt injection, dan for DAN jailbreak variants, encoding for encoding-based bypasses (Base64, ROT-13), leakreplay for training data extraction, packagehallucination for fake package suggestions, malwaregen for malware generation attempts, xss for cross-site scripting via LLM output, and tap for Tree of Attack with Pruning. These are research-backed attacks drawn from academic literature and real-world adversarial techniques.

Garak’s probes are largely static — they use pre-defined attack patterns that have been documented in security research. The atkgen module adds automated attack generation, and buffs can modify probe behavior, but the primary workflow is running established attack playbooks against a target model.

Promptfoo covers 50+ vulnerability types including prompt injection, jailbreaks, PII leaks, tool misuse, privilege escalation, and toxic content generation. The distinction is in how attacks are generated. Promptfoo creates context-aware adversarial inputs tuned to your specific application — you describe what your application does, and the tool generates attacks relevant to that use case. This means attacks adapt to your system prompt, expected behavior, and tool configuration rather than applying generic patterns.

For known, documented attack techniques (encoding tricks, DAN variants, training data extraction), Garak’s probe library is more comprehensive. For application-specific vulnerability discovery, Promptfoo’s generated attacks are more targeted.

Application-Layer vs Model-Layer Testing

This is the most important distinction between the two tools. Garak tests models. Promptfoo tests applications.

Garak sends adversarial inputs to a model endpoint and analyzes the response. It works at the model layer — testing what the model will say when given a crafted prompt. It does not understand what sits between the user and the model. If you have a RAG pipeline that retrieves context from a vector database, Garak will not test whether an attacker can poison that context. If you have an agent that calls external tools, Garak will not test whether the agent can be manipulated into making unauthorized API calls.

Promptfoo treats the entire application as the attack surface. The red team module tests RAG pipelines for indirect prompt injection via retrieved documents. It tests agent architectures for multi-turn escalation, memory poisoning, and tool parameter tampering. It tests API integrations for data exfiltration paths. This broader scope catches vulnerabilities that exist in the application layer rather than the model itself.

If your LLM deployment is a direct API call to a model with a system prompt, both tools work. If your deployment includes retrieval augmentation, tool use, or multi-step agent workflows, Promptfoo covers attack surfaces that Garak does not reach.

Reporting and Compliance

Garak produces three output types per scan: a debug log, a JSONL report with per-attempt records, and a hit log with vulnerability findings only. Version 0.14.0 added redesigned HTML reports. The output is functional but geared toward security researchers who will process results programmatically. Garak can push findings to the AI Vulnerability Database and integrate with NeMo Guardrails for guardrail effectiveness comparison.

Promptfoo generates reports that map to compliance frameworks: OWASP Top 10 for LLM Applications, NIST AI Risk Management Framework, MITRE ATLAS, and the EU AI Act. The web UI shows vulnerability categories with severity breakdowns and risk scores. For teams that need to produce audit-ready documentation or demonstrate compliance with AI governance requirements, Promptfoo’s reporting is significantly more mature.

The commercial tier adds team collaboration, shared dashboards, and enterprise compliance certifications (SOC2, ISO 27001, HIPAA).

Developer Experience and Integration

Garak is Python-native. Install with pip install garak, export your model API key, run garak --target_type openai --target_name gpt-4 --probes encoding. The CLI is the primary (and effectively only) interface. Configuration can be done through CLI flags or JSON config files. It fits naturally into Python-based ML workflows.

Promptfoo is TypeScript-native but installable via npm, pip, or Homebrew. Configuration lives in YAML files (promptfooconfig.yaml), which makes it easy to version control alongside application code. The interactive setup wizard (npx promptfoo@latest redteam setup) walks through application description, target configuration, and plugin selection. The web UI (npx promptfoo view) provides a visual interface for reviewing results.

For CI/CD integration, Promptfoo has a more developed story — npm package, GitHub Actions support, and YAML config that slots into existing pipeline definitions. Garak can run in pipelines too, but the integration is more manual.

When to Choose Garak

Choose Garak if:

You need a dedicated vulnerability scanner with a deep library of research-backed attack probes
Your testing focuses on model-level security (what the model responds to adversarial inputs)
You want a fully free, open-source tool with no commercial tier or feature restrictions
Your team works primarily in Python and wants a CLI-first workflow
You need to test across many model providers (23 supported backends)
Academic rigor matters — Garak has a published research paper and is maintained by NVIDIA’s AI Red Team

When to Choose Promptfoo

Choose Promptfoo if:

You need red teaming as part of a broader LLM testing workflow (evaluation, prompt comparison, guardrails)
Your deployment includes RAG pipelines, agent architectures, or tool integrations that need application-layer testing
Compliance mapping to OWASP, NIST, MITRE ATLAS, or the EU AI Act is a requirement
CI/CD integration with YAML configuration is important for your development workflow
You want both open-source flexibility and the option to upgrade to enterprise features (SOC2, team collaboration)
Context-aware attack generation tuned to your specific application is more valuable than generic probe libraries

For teams with thorough security testing requirements, running both tools provides complementary coverage — Garak for deep model-layer probing and Promptfoo for application-layer red teaming and CI/CD integration.

For more options, browse our AI security tools category.

Frequently Asked Questions

Is Garak or Promptfoo better for LLM red teaming?

It depends on what you are testing. Garak is a dedicated vulnerability scanner with 37+ probe modules focused on known attack techniques — DAN jailbreaks, encoding bypasses, training data extraction, and more. Promptfoo covers 50+ vulnerability types but also includes evaluation, prompt comparison, and guardrails. If you only need adversarial security scanning, Garak goes deeper on attack patterns. If you need red teaming alongside broader LLM testing, Promptfoo covers more ground.

Are both tools free and open-source?

Garak is fully free and open-source under Apache 2.0 with no paid tiers. Promptfoo’s core CLI is free and open-source under MIT. Promptfoo also offers a commercial tier with enterprise features including SOC2 compliance, ISO 27001 certification, and team collaboration tools.

Can Garak test RAG pipelines and agent architectures?

Garak focuses on testing model endpoints — it sends adversarial inputs to a model API and analyzes responses. It does not natively test RAG pipelines, agent tool use, or multi-turn agent workflows. Promptfoo can test RAG pipelines and agent architectures because it treats the entire application (not just the model) as the attack surface.

Which tool supports more LLM providers?

Garak supports 23 generator backends including OpenAI, Hugging Face, AWS Bedrock, Cohere, Groq, LiteLLM, Ollama, and custom REST endpoints. Promptfoo supports OpenAI, Anthropic, Azure OpenAI, Google, AWS Bedrock, Ollama, Hugging Face, and custom endpoints. Coverage is broadly similar, though Garak has more specialized backends like NIM, NVCF, and Watsonx.

Do I need both Garak and Promptfoo?

Running both can make sense for thorough coverage. Garak’s pre-built probe library covers research-backed attack techniques that Promptfoo may not replicate exactly. Promptfoo’s context-aware attack generation and RAG/agent testing cover application-layer vulnerabilities that Garak does not reach. For maximum coverage, use Garak for model-level probing and Promptfoo for application-level red teaming.

Written by

Suphi Cankurt

Suphi Cankurt is an application security enthusiast based in Helsinki, Finland. He reviews and compares 129 AppSec tools across 10 categories on AppSec Santa. Learn more.

Resource Hubs

Learn AppSec

Tool Comparisons

Alternatives & Roundups

Tool Reviews

Free Security Tools

Garak vs Promptfoo