Skip to content
LLM Guard

LLM Guard

Category: AI Security
License: Free (Open-Source)
Suphi Cankurt
Suphi Cankurt
AppSec Enthusiast
Updated February 9, 2026
4 min read
Key Takeaways
  • Open-source LLM security toolkit with 15 input scanners and 20 output scanners
  • Detects prompt injection, PII leaks, toxic outputs, and data leakage
  • MIT licensed, 2.5k GitHub stars, deployable as a standalone API server
  • Works with any LLM provider — not locked to a specific vendor

LLM Guard is an open-source AI security toolkit by Protect AI that scans LLM inputs and outputs for security and compliance risks. It has 2.5k GitHub stars and 342 forks on GitHub.

LLM Guard architecture showing input scanners processing prompts before the LLM and output scanners validating responses

The library is MIT licensed and requires Python 3.10+. Protect AI, the company behind LLM Guard, also develops Guardian and ModelScan for ML supply chain security. The latest release is v0.3.16.

What is LLM Guard?

LLM Guard sits between your application and its language model. It runs 15 input scanners on user prompts before they reach the model, and 20 output scanners on the model’s responses before they reach the user.

Each scanner handles a specific risk: prompt injection, PII exposure, toxic language, secrets in code, and more. Scanners are modular. You pick which ones you need and configure them independently.

The library works with any language model since it processes text, not model internals. It also ships with an API server mode for language-agnostic deployments.

Input Scanning
15 scanners that filter user prompts before they reach your LLM. Covers prompt injection, PII anonymization, secrets detection, toxicity, banned topics, invisible text, and more.
Output Scanning
20 scanners that validate model responses. Checks for bias, malicious URLs, factual consistency, sensitive data leaks, toxicity, and relevance to the original query.
API Server
Deploy LLM Guard as a standalone HTTP API. Integrates with any language or framework, not just Python. Available via Docker for production deployments.

Key Features

FeatureDetails
Input Scanners15 scanners: Anonymize, BanCode, BanCompetitors, BanSubstrings, BanTopics, Code, Gibberish, InvisibleText, Language, PromptInjection, Regex, Secrets, Sentiment, TokenLimit, Toxicity
Output Scanners20 scanners: BanCompetitors, BanSubstrings, BanTopics, Bias, Code, Deanonymize, JSON, Language, LanguageSame, MaliciousURLs, NoRefusal, ReadingTime, FactualConsistency, Gibberish, Regex, Relevance, Sensitive, Sentiment, Toxicity, URLReachability
PII HandlingAnonymize scanner replaces PII in prompts; Deanonymize restores it in outputs when appropriate
Prompt InjectionDedicated scanner using ML models to detect direct and indirect injection attempts
Secrets DetectionIdentifies API keys, passwords, and credentials in both inputs and outputs
Factual ConsistencyOutput scanner that checks whether responses are consistent with provided context
LicenseMIT — fully open-source
Python SupportRequires Python >=3.10, <3.13

Input Scanners

The 15 input scanners process user prompts before they reach the LLM:

  • Anonymize — replaces PII (names, emails, phone numbers, credit card numbers) with placeholders
  • BanCode — blocks prompts containing code snippets
  • BanCompetitors — filters mentions of specified competitor names
  • BanSubstrings — blocks prompts containing specific text patterns
  • BanTopics — prevents prompts about restricted subjects
  • Code — detects code content in prompts
  • Gibberish — identifies nonsensical or garbled input
  • InvisibleText — detects hidden Unicode characters used in prompt injection
  • Language — enforces language restrictions on input
  • PromptInjection — detects direct and indirect injection attacks using ML models
  • Regex — pattern-based filtering with custom regular expressions
  • Secrets — identifies API keys, passwords, and credentials
  • Sentiment — analyzes emotional tone of input
  • TokenLimit — enforces maximum token count
  • Toxicity — filters harmful, offensive, or abusive language

Output Scanners

The 20 output scanners validate and filter model responses:

  • FactualConsistency — checks whether the response is consistent with the provided context
  • Bias — detects biased or discriminatory content in responses
  • Deanonymize — restores PII that was anonymized in the input stage
  • JSON — validates JSON structure and schema compliance
  • MaliciousURLs — blocks links to known malicious sites
  • NoRefusal — flags when the model refuses to answer without good reason
  • Relevance — checks whether the response matches the original query
  • URLReachability — verifies that URLs in responses actually resolve
  • ReadingTime — estimates the reading time of responses
  • LanguageSame — verifies the response is in the same language as the input
Playground available
Protect AI hosts an interactive playground on Hugging Face Spaces where you can test LLM Guard scanners without installing anything. Visit the LLM Guard Playground to try it out.

Getting Started

1
Install the library — Run pip install llm-guard. Requires Python 3.10 or higher. For GPU-accelerated inference, install with pip install llm-guard[onnxruntime-gpu].
2
Choose your scanners — Import the input and output scanners you need. Each scanner is independent, so you only load what you use.
3
Scan inputs and outputs — Use scan_prompt() to process user input through your chosen input scanners, send the sanitized prompt to your LLM, then use scan_output() to validate the response.
4
Deploy as API (optional) — For non-Python environments, deploy LLM Guard as a standalone API server. The API server wraps all scanner functionality behind HTTP endpoints.

When to use LLM Guard

LLM Guard fits teams that want open-source, self-hosted guardrails for LLM applications. Since it runs locally, your data never leaves your infrastructure.

The modular scanner design means you can start with just prompt injection detection and add PII anonymization or toxicity filtering later. Each scanner works independently, so adding one doesn’t affect the others.

The library works with any LLM provider because it scans the text, not the model. Whether you use OpenAI, Anthropic, local models, or a mix, the same scanners apply.

Best for
Teams that need self-hosted, open-source input/output scanning for LLM applications with fine-grained control over which security checks to apply.

For a broader look at AI and LLM security, read our AI security guide. For a different approach to LLM safety that includes dialog flow control, look at NeMo Guardrails. For red teaming and adversarial testing rather than runtime protection, consider Garak or PyRIT. Lakera Guard offers similar scanner functionality as a managed cloud API.

Frequently Asked Questions

What is LLM Guard?
LLM Guard is an open-source security toolkit by Protect AI that provides input and output scanners for LLM applications. It has 2.5k GitHub stars, offers 15 input scanners and 20 output scanners, and is MIT licensed.
Is LLM Guard free to use?
Yes, LLM Guard is free and open-source under the MIT license. Install it via pip (requires Python 3.10+) and deploy it as a standalone API server or integrate it directly into your Python application.
Does LLM Guard protect against prompt injection?
Yes, LLM Guard includes a dedicated PromptInjection scanner that detects direct and indirect injection attacks. Additional input scanners handle jailbreak detection, invisible text detection, and content filtering.
What LLM vulnerabilities does LLM Guard address?
LLM Guard covers prompt injection, PII anonymization, toxicity filtering, secrets detection, malicious URL blocking, bias detection, factual consistency checking, and data leakage prevention through its modular scanner architecture.