FuzzyAI is an open-source LLM fuzzing framework from CyberArk Labs that tests large language models for jailbreak vulnerabilities using 18 built-in attack techniques. Released in December 2024 under the Apache 2.0 license, it is listed in the AI security category and has earned 1.3k+ GitHub stars.
FuzzyAI has successfully jailbroken every major AI model it was tested against. The framework implements 18 attack techniques — ranging from ASCII art obfuscation (ArtPrompt) to multi-turn conversational escalation (crescendo) and genetic algorithm prompt mutation — designed to help security teams identify guardrail bypass vulnerabilities before attackers exploit them. Unlike fixed test suites, FuzzyAI uses mutation-based fuzzing to discover novel jailbreak paths.
Key Features at a Glance
| Feature | Details |
|---|---|
| Attack Techniques | 18 built-in methods including ArtPrompt, DAN jailbreaks, crescendo, genetic algorithm mutation, and ASCII smuggling |
| Target Providers | OpenAI, Anthropic, Gemini, Azure OpenAI, AWS Bedrock, Hugging Face, Ollama, and custom REST APIs |
| Fuzzing Approach | Mutation-based fuzzing generates novel jailbreak paths rather than relying on fixed test cases |
| GUI + CLI | Web-based GUI for interactive testing and CLI for automation and CI/CD integration |
| Extensibility | Custom attack method framework for domain-specific vulnerability testing |
| System Prompt Extraction | Tests whether system prompts can be leaked through adversarial queries |
| Guardrail Regression | Re-run attacks after updates to verify previously blocked techniques stay blocked |
| License | Apache 2.0 open-source, free to use |
Overview
FuzzyAI takes a fuzzing approach to LLM security testing. Rather than checking for known vulnerabilities with fixed test cases, it generates mutated attack prompts using algorithmic techniques (genetic algorithms, conversational escalation, encoding tricks) to discover previously unknown jailbreak paths. Compared to broader LLM scanners like Garak that cover 50+ probe modules across prompt injection, hallucination, and toxicity, FuzzyAI goes deeper on jailbreak-specific fuzzing.
The tool targets the guardrails that LLM providers and deployers put in place. If a model is supposed to refuse certain requests, FuzzyAI systematically probes different ways to bypass that refusal. When it finds a technique that works, it reports the successful jailbreak along with the specific prompt that triggered it.
This approach is particularly valuable because LLM guardrails are inherently fragile. A model might correctly refuse a direct request for harmful content but comply when the same request is rephrased using ASCII art, split across multiple conversational turns, or encoded in invisible Unicode characters.
Key Features

Attack Techniques
FuzzyAI implements 18 distinct attack methods, each targeting a different weakness in LLM safety mechanisms. The following table summarizes the key techniques:
| Technique | How It Works |
|---|---|
| ArtPrompt | Replaces safety-triggering words with ASCII art representations, bypassing text-based content filters while preserving meaning for the model |
| DAN (Do Anything Now) | Promotes the LLM to adopt an unrestricted persona that ignores standard content filters |
| Crescendo | Starts with innocuous queries and gradually steers the conversation toward restricted topics across multiple turns |
| Genetic Algorithm | Uses evolutionary algorithms to mutate prompts, selecting for variants that get closer to bypassing guardrails |
| ASCII Smuggling | Embeds hidden instructions using invisible Unicode Tag characters that are processed by the model but invisible to users |
| Many-Shot Jailbreaking | Provides many examples of the desired (harmful) behavior in the prompt context to shift the model’s behavior |
| Shuffle Inconsistency Attack | Rearranges harmful text in prompts to exploit inconsistencies between an LLM’s comprehension and its safety mechanisms |
| ActorAttack | Builds semantic networks of “actors” to subtly guide conversations toward restricted targets |
| PAIR | Automates adversarial prompt generation by iteratively refining prompts using two LLMs |
| Best-of-n Jailbreaking | Uses input variations to repeatedly elicit harmful responses from the model |
Additional techniques cover system prompt extraction, ethical filter bypass, and information leakage testing.

Supported Providers
FuzzyAI can target models from all major LLM providers:
- OpenAI — GPT models via API
- Anthropic — Claude models
- Google Gemini — Gemini models
- Azure OpenAI — Azure-hosted OpenAI models
- AWS Bedrock — Claude, Llama, Titan, and other Bedrock models
- Hugging Face — Hosted model endpoints
- Ollama — Local models (requires 8GB+ RAM, 16GB+ recommended)
- Custom REST API — Any LLM with an HTTP endpoint
Each provider requires its own API key configuration via environment variables.

Extensibility
FuzzyAI is built to be extended. Security teams can add custom attack methods to:
- Test domain-specific guardrails unique to their LLM deployment
- Implement novel attack techniques from recent research papers
- Create regression tests for previously discovered jailbreaks
- Build attack chains that combine multiple techniques
Use Cases
Pre-deployment security assessment — Before deploying an LLM-powered application, run FuzzyAI against it to discover jailbreak vulnerabilities in your guardrails.
Red team exercises — Security teams use FuzzyAI as part of AI red team assessments to systematically test how well LLM safety mechanisms hold up.
Guardrail regression testing — After updating guardrails or switching models, re-run FuzzyAI to verify that previously blocked attack techniques remain blocked.
Research and benchmarking — AI security researchers use FuzzyAI to compare the jailbreak resistance of different models and guardrail implementations.
Strengths & Limitations
Strengths:
- Mutation-based fuzzing discovers novel jailbreaks that fixed test suites miss
- 18 attack techniques cover a wide range of bypass methods
- Backed by CyberArk Labs with active research and development
- Extensible design lets teams add custom attack methods
- Supports all major LLM providers plus custom endpoints
- Free and open-source under Apache 2.0
Limitations:
- Covers only jailbreak/guardrail bypass, not broader LLM security issues like hallucination, toxicity measurement, or data leakage
- Requires API keys and usage costs for cloud-hosted target models
- Running local models via Ollama needs significant resources (8GB+ RAM minimum)
- Younger project compared to established tools like Garak, with a smaller community and fewer contributors
- Results depend on the quality of the jailbreak detection criteria you define
Getting Started
git clone https://github.com/cyberark/FuzzyAI.git && cd FuzzyAI.poetry install to set up the environment.export OPENAI_API_KEY="sk-..." for OpenAI, or export ANTHROPIC_API_KEY="..." for Anthropic.How FuzzyAI Compares
FuzzyAI fills a specific role among AI security tools: focused jailbreak fuzzing with mutation-based attack techniques. It is narrower in scope but deeper in jailbreak coverage compared to multi-purpose LLM security scanners.
Compared to Garak, which offers 50+ probe modules covering prompt injection, hallucination, toxicity, and data leakage, FuzzyAI specializes in jailbreak discovery through mutation-based techniques like genetic algorithms and crescendo attacks. For an evaluation framework that combines red teaming with prompt testing and model comparison, look at Promptfoo. For Microsoft’s multi-turn red teaming orchestrator built for enterprise AI red teams, check PyRIT.
For structured vulnerability testing with OWASP mapping, see DeepTeam. For runtime protection rather than pre-deployment testing, consider Lakera Guard, LLM Guard, or NeMo Guardrails.
For a broader overview of AI security threats and defense tools, see the AI security tools category page.