FuzzyAI Review 2026: CyberArk's Open-Source LLM Fuzzer

FuzzyAI is an open-source LLM fuzzing framework from CyberArk Labs that tests large language models for jailbreak vulnerabilities using 18 built-in attack techniques.

Released in December 2024 under the Apache 2.0 license, it is listed in the AI security category and has earned approximately 1,492 GitHub stars.

CyberArk Labs reports that FuzzyAI bypassed every major model in their internal testing — claims of model-by-model jailbreak success are vendor-attributed, and results vary depending on the model version, system prompt, and guardrail configuration evaluated.

The framework implements 18 attack techniques — ranging from ASCII art obfuscation (ArtPrompt) to multi-turn conversational escalation (crescendo) and genetic algorithm prompt mutation — designed to help security teams identify guardrail bypass vulnerabilities before attackers exploit them. Unlike fixed test suites, FuzzyAI uses mutation-based fuzzing to discover novel jailbreak paths.

Key Features at a Glance

Feature	Details
Attack Techniques	18 built-in methods including ArtPrompt, DAN jailbreaks, crescendo, genetic algorithm mutation, and ASCII smuggling
Target Providers	OpenAI, Anthropic, Gemini, Azure OpenAI, AWS Bedrock, Hugging Face, Ollama, and custom REST APIs
Fuzzing Approach	Mutation-based fuzzing generates novel jailbreak paths rather than relying on fixed test cases
GUI + CLI	Web-based GUI for interactive testing and CLI for automation and CI/CD integration
Extensibility	Custom attack method framework for domain-specific vulnerability testing
System Prompt Extraction	Tests whether system prompts can be leaked through adversarial queries
Guardrail Regression	Re-run attacks after updates to verify previously blocked techniques stay blocked
License	Apache 2.0 open-source, free to use

Overview

FuzzyAI takes a fuzzing approach to LLM security testing. Rather than checking for known vulnerabilities with fixed test cases, it generates mutated attack prompts using algorithmic techniques (genetic algorithms, conversational escalation, encoding tricks) to discover previously unknown jailbreak paths.

Compared to broader LLM scanners like Garak that cover 50+ probe modules across prompt injection , hallucination, and toxicity, FuzzyAI goes deeper on jailbreak-specific fuzzing.

The tool targets the guardrails that LLM providers and deployers put in place. If a model is supposed to refuse certain requests, FuzzyAI systematically probes different ways to bypass that refusal.

When it finds a technique that works, it reports the successful jailbreak along with the specific prompt that triggered it.

This approach is particularly valuable because LLM guardrails are inherently fragile. A model might correctly refuse a direct request for harmful content but comply when the same request is rephrased using ASCII art, split across multiple conversational turns, or encoded in invisible Unicode characters.

18 Attack Techniques

Built-in methods including ArtPrompt, DAN jailbreaks, crescendo attacks, genetic algorithm mutation, ASCII smuggling, many-shot jailbreaking, PAIR, ActorAttack, and more.

Multi-Provider Support

Test models from OpenAI, Anthropic, Gemini, Azure, Bedrock, Hugging Face, Ollama, and any custom REST API endpoint.

Extensible Framework

Add your own attack methods to test domain-specific vulnerabilities and guardrails unique to your LLM deployments.

What are FuzzyAI’s key features?

FuzzyAI GUI showing the execution step of an LLM jailbreak test with multiple attack prompts and model responses in a dark-themed web interface

Attack Techniques

FuzzyAI implements 18 distinct attack methods, each targeting a different weakness in LLM safety mechanisms. The following table summarizes the key techniques:

Technique	How It Works
ArtPrompt	Replaces safety-triggering words with ASCII art representations, bypassing text-based content filters while preserving meaning for the model
DAN (Do Anything Now)	Promotes the LLM to adopt an unrestricted persona that ignores standard content filters
Crescendo	Starts with innocuous queries and gradually steers the conversation toward restricted topics across multiple turns
Genetic Algorithm	Uses evolutionary algorithms to mutate prompts, selecting for variants that get closer to bypassing guardrails
ASCII Smuggling	Embeds hidden instructions using invisible Unicode Tag characters that are processed by the model but invisible to users
Many-Shot Jailbreaking	Provides many examples of the desired (harmful) behavior in the prompt context to shift the model’s behavior
Shuffle Inconsistency Attack	Rearranges harmful text in prompts to exploit inconsistencies between an LLM’s comprehension and its safety mechanisms
ActorAttack	Builds semantic networks of “actors” to subtly guide conversations toward restricted targets
PAIR	Automates adversarial prompt generation by iteratively refining prompts using two LLMs
Best-of-n Jailbreaking	Uses input variations to repeatedly elicit harmful responses from the model

Additional techniques cover system prompt extraction, ethical filter bypass, and information leakage testing.

FuzzyAI jailbreak attempt results showing a successful guardrail bypass with the prompt and model response details

Supported Providers

FuzzyAI can target models from all major LLM providers:

OpenAI — GPT models via API
Anthropic — Claude models
Google Gemini — Gemini models
Azure OpenAI — Azure-hosted OpenAI models
AWS Bedrock — Claude, Llama, Titan, and other Bedrock models
Hugging Face — Hosted model endpoints
Ollama — Local models (requires 8GB+ RAM, 16GB+ recommended)
Custom REST API — Any LLM with an HTTP endpoint

Each provider requires its own API key configuration via environment variables.

FuzzyAI bulk testing terminal output showing automated fuzzing results across multiple attack techniques with pass/fail indicators

Extensibility

FuzzyAI is built to be extended. Security teams can add custom attack methods to:

Test domain-specific guardrails unique to their LLM deployment
Implement novel attack techniques from recent research papers
Create regression tests for previously discovered jailbreaks
Build attack chains that combine multiple techniques

FuzzyAI GUI Step 2: Attack Selection screen showing the hst History Framing attack mode selected with extra parameter input fields

Note

Built by CyberArk Labs

FuzzyAI comes from CyberArk Labs, the research arm of CyberArk — a publicly traded identity security company. The tool was developed as part of CyberArk’s AI security research program and released as open source to help the broader security community test LLM deployments.

OWASP, NIST, and MITRE alignment

FuzzyAI’s 18 attack techniques map directly onto the OWASP Top 10 for LLM Applications — specifically LLM01: Prompt Injection (DAN, ArtPrompt, ASCII smuggling, many-shot, PAIR) and partially LLM07: System Prompt Leakage (system prompt extraction techniques). Mutation-based fuzzing as a methodology is the editorial wedge against fixed-probe peers like Garak and DeepTeam.

The fuzzing telemetry feeds into the NIST AI RMF Measure function — every successful bypass is structured evidence for risk reviews, and the regression test pattern (re-run after guardrail updates) supports the Manage function. For threat modeling, FuzzyAI traces map onto MITRE ATLAS techniques AML.T0051 LLM Prompt Injection and AML.T0054 LLM Jailbreak, giving red teams shared taxonomy with detection and SOC tooling.

When should you use FuzzyAI?

Pre-deployment security assessment — Before deploying an LLM-powered application, run FuzzyAI against it to discover jailbreak vulnerabilities in your guardrails.

Red team exercises — Security teams use FuzzyAI as part of AI red team assessments to systematically test how well LLM safety mechanisms hold up.

Guardrail regression testing — After updating guardrails or switching models, re-run FuzzyAI to verify that previously blocked attack techniques remain blocked.

Research and benchmarking — AI security researchers use FuzzyAI to compare the jailbreak resistance of different models and guardrail implementations.

Strengths & Limitations

Strengths:

Mutation-based fuzzing discovers novel jailbreaks that fixed test suites miss
18 attack techniques cover a wide range of bypass methods
Backed by CyberArk Labs with active research and development
Extensible design lets teams add custom attack methods
Supports all major LLM providers plus custom endpoints
Free and open-source under Apache 2.0

Limitations:

Covers only jailbreak/guardrail bypass, not broader LLM security issues like hallucination, toxicity measurement, or data leakage
Requires API keys and usage costs for cloud-hosted target models
Running local models via Ollama needs significant resources (8GB+ RAM minimum)
Younger project compared to established tools like Garak, with a smaller community and fewer contributors
Results depend on the quality of the jailbreak detection criteria you define

How do I get started with FuzzyAI?

Clone the repository — Get FuzzyAI from GitHub: git clone https://github.com/cyberark/FuzzyAI.git && cd FuzzyAI.

Install dependencies — FuzzyAI uses Poetry for dependency management. Run poetry install to set up the environment.

Configure API keys — Set environment variables for your target provider. For example: export OPENAI_API_KEY="sk-..." for OpenAI, or export ANTHROPIC_API_KEY="..." for Anthropic.

Run your first fuzz — Execute a fuzzing session against a target model. FuzzyAI will cycle through its attack techniques and report any successful jailbreaks.

Review results — Examine the output for successful bypasses. Each finding includes the specific prompt that triggered the jailbreak and the attack technique used.

FuzzyAI GUI Step 1: Model Selection screen showing target model configuration with OpenAI gpt-4o selected and environment settings sidebar

How FuzzyAI Compares

FuzzyAI fills a specific role among AI security tools: focused jailbreak fuzzing with mutation-based attack techniques. It is narrower in scope but deeper in jailbreak coverage compared to multi-purpose LLM security scanners. The mutation-based fuzzing methodology is the editorial wedge — peers ship fixed probe libraries, FuzzyAI generates novel variants on the fly.

Garak is the broadest peer with 50+ probe modules covering prompt injection, hallucination, toxicity, and data leakage. Garak is the better pick when broad scanner coverage matters more than jailbreak-specific depth, and when NVIDIA backing and a larger community influence tooling decisions.

Augustus is the closest single-binary alternative — Praetorian’s Go-based scanner with 210+ adversarial probes across 47 attack categories. Augustus is the right fit when production deployment in CI runners and air-gapped environments needs a single Go binary instead of a Python framework.

PyRIT is Microsoft’s multi-turn red teaming orchestrator, built for enterprise AI red team campaigns with strong Azure integration and human-in-the-loop workflows. PyRIT excels at orchestrating long campaigns; FuzzyAI is the right tool when generating novel jailbreak variants is the core need.

Promptfoo is an evaluation framework that combines red teaming with prompt testing and model comparison. Promptfoo fits teams that want one tool for both quality and security evals rather than a security-focused fuzzing specialist.

DeepTeam is the structured-OWASP alternative with explicit Top 10 for LLMs mapping and compliance-focused reporting. DeepTeam is the better fit when audit evidence and framework alignment outrank raw attack-surface depth.

For runtime protection rather than pre-deployment testing, consider Lakera Guard (acquired by Check Point in September 2025), LLM Guard , or NeMo Guardrails .

Tip

Best for

Security teams and researchers who need to systematically test LLM guardrails for jailbreak vulnerabilities using mutation-based fuzzing techniques. Particularly useful for organizations deploying custom LLM applications that need to verify how well their safety mechanisms hold up.

For a broader overview of AI security threats and defense tools, see the AI security tools category page.

Visit FuzzyAI View source on GitHub (1492★)

Frequently Asked Questions

What is FuzzyAI?

FuzzyAI is an open-source LLM fuzzing framework created by CyberArk Labs. It automatically tests large language models for jailbreak vulnerabilities using 18 attack techniques. Released in December 2024 under Apache 2.0, it has jailbroken every major AI model it was tested against.

Is FuzzyAI free to use?

Yes. FuzzyAI is free and open-source under the Apache 2.0 license. It is available on GitHub at github.com/cyberark/FuzzyAI. You need API keys for the target LLM providers you want to test, but the tool itself is free.

What attack techniques does FuzzyAI support?

FuzzyAI implements 18 attack methods including ArtPrompt (ASCII art obfuscation), DAN (Do Anything Now) persona jailbreaks, crescendo multi-turn escalation, genetic algorithm prompt mutation, ASCII smuggling with invisible Unicode Tag characters, many-shot jailbreaking, PAIR (iterative prompt refinement), ActorAttack, and Best-of-n jailbreaking. You can also add custom attack methods.

Which LLM providers does FuzzyAI support?

FuzzyAI can target models from OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Hugging Face, and Ollama (for local models). It also supports custom REST API endpoints, so you can test any LLM that exposes an HTTP API.

How does FuzzyAI compare to Garak?

Both are open-source LLM security testing tools, but they have different focuses. FuzzyAI specializes in jailbreak fuzzing with mutation-based attack techniques (genetic algorithms, crescendo). Garak is a broader vulnerability scanner with 50+ probe modules covering prompt injection, hallucination, toxicity, and data leakage. FuzzyAI goes deeper on jailbreak techniques; Garak covers a wider attack surface.

Can I add custom attack methods to FuzzyAI?

Yes. FuzzyAI is built as an extensible framework specifically designed for researchers and security teams to add their own attack methods. This lets you create domain-specific tests tailored to your organization’s threat model and the specific guardrails your LLM deployments use.