FuzzyAI is an open-source LLM fuzzing framework from CyberArk Labs that tests large language models for jailbreak vulnerabilities using 18 built-in attack techniques. Released in December 2024 under the Apache 2.0 license, it is listed in the AI security category and has earned approximately 1,300 GitHub stars (as of April 2026 — confirm the current count on github.com/cyberark/FuzzyAI before pinning).
CyberArk Labs reports that FuzzyAI bypassed every major model in their internal testing — claims of model-by-model jailbreak success are vendor-attributed, and results vary depending on the model version, system prompt, and guardrail configuration evaluated. The framework implements 18 attack techniques — ranging from ASCII art obfuscation (ArtPrompt) to multi-turn conversational escalation (crescendo) and genetic algorithm prompt mutation — designed to help security teams identify guardrail bypass vulnerabilities before attackers exploit them. Unlike fixed test suites, FuzzyAI uses mutation-based fuzzing to discover novel jailbreak paths.
Key Features at a Glance
| Feature | Details |
|---|---|
| Attack Techniques | 18 built-in methods including ArtPrompt, DAN jailbreaks, crescendo, genetic algorithm mutation, and ASCII smuggling |
| Target Providers | OpenAI, Anthropic, Gemini, Azure OpenAI, AWS Bedrock, Hugging Face, Ollama, and custom REST APIs |
| Fuzzing Approach | Mutation-based fuzzing generates novel jailbreak paths rather than relying on fixed test cases |
| GUI + CLI | Web-based GUI for interactive testing and CLI for automation and CI/CD integration |
| Extensibility | Custom attack method framework for domain-specific vulnerability testing |
| System Prompt Extraction | Tests whether system prompts can be leaked through adversarial queries |
| Guardrail Regression | Re-run attacks after updates to verify previously blocked techniques stay blocked |
| License | Apache 2.0 open-source, free to use |
Overview
FuzzyAI takes a fuzzing approach to LLM security testing. Rather than checking for known vulnerabilities with fixed test cases, it generates mutated attack prompts using algorithmic techniques (genetic algorithms, conversational escalation, encoding tricks) to discover previously unknown jailbreak paths. Compared to broader LLM scanners like Garak that cover 50+ probe modules across prompt injection, hallucination, and toxicity, FuzzyAI goes deeper on jailbreak-specific fuzzing.
The tool targets the guardrails that LLM providers and deployers put in place. If a model is supposed to refuse certain requests, FuzzyAI systematically probes different ways to bypass that refusal. When it finds a technique that works, it reports the successful jailbreak along with the specific prompt that triggered it.
This approach is particularly valuable because LLM guardrails are inherently fragile. A model might correctly refuse a direct request for harmful content but comply when the same request is rephrased using ASCII art, split across multiple conversational turns, or encoded in invisible Unicode characters.
Key Features

Attack Techniques
FuzzyAI implements 18 distinct attack methods, each targeting a different weakness in LLM safety mechanisms. The following table summarizes the key techniques:
| Technique | How It Works |
|---|---|
| ArtPrompt | Replaces safety-triggering words with ASCII art representations, bypassing text-based content filters while preserving meaning for the model |
| DAN (Do Anything Now) | Promotes the LLM to adopt an unrestricted persona that ignores standard content filters |
| Crescendo | Starts with innocuous queries and gradually steers the conversation toward restricted topics across multiple turns |
| Genetic Algorithm | Uses evolutionary algorithms to mutate prompts, selecting for variants that get closer to bypassing guardrails |
| ASCII Smuggling | Embeds hidden instructions using invisible Unicode Tag characters that are processed by the model but invisible to users |
| Many-Shot Jailbreaking | Provides many examples of the desired (harmful) behavior in the prompt context to shift the model’s behavior |
| Shuffle Inconsistency Attack | Rearranges harmful text in prompts to exploit inconsistencies between an LLM’s comprehension and its safety mechanisms |
| ActorAttack | Builds semantic networks of “actors” to subtly guide conversations toward restricted targets |
| PAIR | Automates adversarial prompt generation by iteratively refining prompts using two LLMs |
| Best-of-n Jailbreaking | Uses input variations to repeatedly elicit harmful responses from the model |
Additional techniques cover system prompt extraction, ethical filter bypass, and information leakage testing.

Supported Providers
FuzzyAI can target models from all major LLM providers:
- OpenAI — GPT models via API
- Anthropic — Claude models
- Google Gemini — Gemini models
- Azure OpenAI — Azure-hosted OpenAI models
- AWS Bedrock — Claude, Llama, Titan, and other Bedrock models
- Hugging Face — Hosted model endpoints
- Ollama — Local models (requires 8GB+ RAM, 16GB+ recommended)
- Custom REST API — Any LLM with an HTTP endpoint
Each provider requires its own API key configuration via environment variables.

Extensibility
FuzzyAI is built to be extended. Security teams can add custom attack methods to:
- Test domain-specific guardrails unique to their LLM deployment
- Implement novel attack techniques from recent research papers
- Create regression tests for previously discovered jailbreaks
- Build attack chains that combine multiple techniques

OWASP, NIST, and MITRE alignment
FuzzyAI’s 18 attack techniques map directly onto the OWASP Top 10 for LLM Applications — specifically LLM01: Prompt Injection (DAN, ArtPrompt, ASCII smuggling, many-shot, PAIR) and partially LLM07: System Prompt Leakage (system prompt extraction techniques). Mutation-based fuzzing as a methodology is the editorial wedge against fixed-probe peers like Garak and DeepTeam.
The fuzzing telemetry feeds into the NIST AI RMF Measure function — every successful bypass is structured evidence for risk reviews, and the regression test pattern (re-run after guardrail updates) supports the Manage function. For threat modeling, FuzzyAI traces map onto MITRE ATLAS techniques AML.T0051 LLM Prompt Injection and AML.T0054 LLM Jailbreak, giving red teams shared taxonomy with detection and SOC tooling.
Use Cases
Pre-deployment security assessment — Before deploying an LLM-powered application, run FuzzyAI against it to discover jailbreak vulnerabilities in your guardrails.
Red team exercises — Security teams use FuzzyAI as part of AI red team assessments to systematically test how well LLM safety mechanisms hold up.
Guardrail regression testing — After updating guardrails or switching models, re-run FuzzyAI to verify that previously blocked attack techniques remain blocked.
Research and benchmarking — AI security researchers use FuzzyAI to compare the jailbreak resistance of different models and guardrail implementations.
Strengths & Limitations
Strengths:
- Mutation-based fuzzing discovers novel jailbreaks that fixed test suites miss
- 18 attack techniques cover a wide range of bypass methods
- Backed by CyberArk Labs with active research and development
- Extensible design lets teams add custom attack methods
- Supports all major LLM providers plus custom endpoints
- Free and open-source under Apache 2.0
Limitations:
- Covers only jailbreak/guardrail bypass, not broader LLM security issues like hallucination, toxicity measurement, or data leakage
- Requires API keys and usage costs for cloud-hosted target models
- Running local models via Ollama needs significant resources (8GB+ RAM minimum)
- Younger project compared to established tools like Garak, with a smaller community and fewer contributors
- Results depend on the quality of the jailbreak detection criteria you define
Getting Started
git clone https://github.com/cyberark/FuzzyAI.git && cd FuzzyAI.poetry install to set up the environment.export OPENAI_API_KEY="sk-..." for OpenAI, or export ANTHROPIC_API_KEY="..." for Anthropic.
How FuzzyAI Compares
FuzzyAI fills a specific role among AI security tools: focused jailbreak fuzzing with mutation-based attack techniques. It is narrower in scope but deeper in jailbreak coverage compared to multi-purpose LLM security scanners. The mutation-based fuzzing methodology is the editorial wedge — peers ship fixed probe libraries, FuzzyAI generates novel variants on the fly.
Garak is the broadest peer with 50+ probe modules covering prompt injection, hallucination, toxicity, and data leakage. Garak is the better pick when broad scanner coverage matters more than jailbreak-specific depth, and when NVIDIA backing and a larger community influence tooling decisions.
Augustus is the closest single-binary alternative — Praetorian’s Go-based scanner with 210+ adversarial probes across 47 attack categories. Augustus is the right fit when production deployment in CI runners and air-gapped environments needs a single Go binary instead of a Python framework.
PyRIT is Microsoft’s multi-turn red teaming orchestrator, built for enterprise AI red team campaigns with strong Azure integration and human-in-the-loop workflows. PyRIT excels at orchestrating long campaigns; FuzzyAI is the right tool when generating novel jailbreak variants is the core need.
Promptfoo is an evaluation framework that combines red teaming with prompt testing and model comparison. Promptfoo fits teams that want one tool for both quality and security evals rather than a security-focused fuzzing specialist.
DeepTeam is the structured-OWASP alternative with explicit Top 10 for LLMs mapping and compliance-focused reporting. DeepTeam is the better fit when audit evidence and framework alignment outrank raw attack-surface depth.
For runtime protection rather than pre-deployment testing, consider Lakera Guard (acquired by Cisco in May 2025), LLM Guard, or NeMo Guardrails.
For a broader overview of AI security threats and defense tools, see the AI security tools category page.