What is AI Security?
Learn how AI security tools protect LLM applications from prompt injection, jailbreaks, and model attacks. Covers OWASP Top 10 for LLMs, AI red teaming, guardrails, and practical advice for securing AI systems.
What AI security means
AI security is the practice of protecting artificial intelligence systems from adversarial attacks, misuse, and unintended behavior. It covers everything from preventing prompt injection in chatbots to scanning ML model files for malicious payloads before deployment.
This is not traditional application security with a new label. The attack surface is fundamentally different. In a standard web application, an attacker exploits bugs in deterministic code. In an AI system, the attacker exploits the model’s reasoning. The same input can produce different outputs. There is no source code to patch when a jailbreak works.
The field has split into two broad areas. LLM security focuses on large language model applications: prompt injection, jailbreaking, data leakage, and hallucination. ML model security focuses on the models themselves: adversarial examples, data poisoning, model theft, and supply chain attacks on model files. Most teams building LLM-powered products need to worry about both.
There are currently 10 active tools in the AI security tools category on this site, and the space is growing faster than any other AppSec category I track.
Why AI security matters now
Three things happened at once: organizations rushed to deploy LLMs, attackers figured out how to exploit them, and regulators started paying attention.
LLM adoption outpaced security
By 2025, most Fortune 500 companies had deployed at least one LLM-powered application. Customer-facing chatbots, internal knowledge assistants, code generation tools, automated support agents. The deployment speed was remarkable. The security testing was often an afterthought.
The result is a wave of LLM applications in production with minimal adversarial testing. Many of them are vulnerable to prompt injection attacks that a junior security engineer could execute in minutes.
Real-world incidents
This is not theoretical. In 2023, a researcher got Bing Chat to reveal its hidden system prompt (codenamed “Sydney”) through a simple prompt injection. Chevrolet’s AI chatbot was tricked into agreeing to sell a car for one dollar. Air Canada’s chatbot fabricated a refund policy that the airline was later held legally responsible for. These are not edge cases. They are the natural result of deploying LLMs without adversarial testing.
Regulatory pressure
The EU AI Act, which entered into force in stages between 2024 and 2026, classifies AI systems by risk level and imposes requirements for high-risk applications. In the US, the NIST AI Risk Management Framework (AI RMF) provides voluntary guidance for managing AI risks. The OWASP Foundation published the Top 10 for LLM Applications, now in version 2025, which has become the de facto checklist for LLM security assessments.
Organizations that deploy LLMs without security testing face both regulatory and reputational risk.
Key threat categories
AI security threats fall into several categories. Understanding them helps you decide which tools to deploy.
Prompt injection
The most common and most dangerous LLM vulnerability. An attacker crafts input that overrides the model’s system instructions.
Direct prompt injection is when the user types something like “Ignore your previous instructions and do X.” Surprisingly effective against unguarded models.
Indirect prompt injection is more insidious. The malicious instruction is embedded in external data the LLM processes: a web page it retrieves, an email it summarizes, a document it analyzes. The user never sees the attack. The model follows the hidden instruction.
Tools like Lakera detect prompt injection in real time with sub-50ms latency. Garak and Promptfoo test whether your application is vulnerable before deployment.
Jailbreaking
Jailbreaking bypasses the model’s safety alignment to make it produce content it was trained to refuse: instructions for harmful activities, generation of malicious code, or bypassing content filters. Techniques include role-playing scenarios, encoding tricks, multi-turn escalation (crescendo attacks), and character persona manipulation.
PyRIT from Microsoft’s AI Red Team supports crescendo attacks and Tree of Attacks with Pruning (TAP) for automated jailbreak testing. DeepTeam covers linear, tree, and crescendo jailbreaking methods.
Training data extraction
Attackers can sometimes extract memorized training data from a model through carefully crafted prompts. This is a privacy risk when models are trained on proprietary, personal, or sensitive data. Research has demonstrated extraction of personally identifiable information, code snippets, and verbatim text passages from production models.
Data poisoning
If an attacker can influence training data, they can introduce backdoors or bias into the model. This is most relevant for organizations that fine-tune models on their own data. A poisoned training set can make the model behave normally on most inputs but produce attacker-controlled outputs for specific triggers.
Model theft and inversion
Model extraction attacks reconstruct a model’s parameters by querying it systematically and analyzing the outputs. Model inversion attacks use a model’s outputs to infer properties of its training data. Both are concerns for organizations that have invested in proprietary model development.
Supply chain attacks on ML models
ML model files (pickle, PyTorch, ONNX) can contain embedded malicious code that executes when the file is loaded. This is the AI equivalent of a compromised npm package. Protect AI Guardian and HiddenLayer scan model files for deserialization attacks and backdoors before they reach production.
Hallucination exploitation
LLMs generate plausible but fabricated information. Attackers can exploit this by designing scenarios where the model’s hallucinations cause real harm: fabricating legal citations, inventing API endpoints that redirect to malicious servers, or generating package names that an attacker has registered (package hallucination).
Excessive agency
When LLMs are connected to tools and APIs (agents), a compromised model can take real-world actions: sending emails, modifying databases, executing code. The OWASP Top 10 lists excessive agency as a distinct risk. Akto focuses specifically on securing AI agents and MCP (Model Context Protocol) servers.
How AI security tools work
AI security tools fall into four categories based on when and how they operate.
Red teaming and probing
These tools test your LLM application before deployment by sending adversarial inputs and evaluating the outputs. Think of them as penetration testing tools for AI.
Garak is NVIDIA’s open-source LLM vulnerability scanner with 37+ probe modules covering prompt injection, jailbreaks, data leakage, and hallucination. Promptfoo is an evaluation and red teaming CLI with 50+ vulnerability types, used by 300,000+ developers. PyRIT is Microsoft’s framework for multi-modal AI red teaming across text, image, audio, and video. DeepTeam covers 40+ vulnerability types mapped to the OWASP Top 10 for LLMs.
For a detailed comparison of the two most popular open-source options, see Garak vs Promptfoo. For a practical guide to building and running red team exercises against LLM applications, see our LLM red teaming guide.
Guardrails and filters
These tools sit between the user and the model in production, filtering inputs before they reach the LLM and scanning outputs before they reach the user.
Lakera provides a real-time API with 98%+ prompt injection detection across 100+ languages. LLM Guard offers 15 input scanners and 20 output scanners as an open-source Python library. NeMo Guardrails uses a domain-specific language (Colang) to define programmable safety policies and dialog flows.
Runtime monitoring
Runtime tools watch AI systems in production and detect anomalous behavior, adversarial inputs, and policy violations in real time. HiddenLayer provides runtime defense for enterprise ML models without requiring access to model weights. Akto monitors AI agent workflows and MCP servers for tool poisoning and unauthorized actions.
Model scanning
These tools analyze ML model files for embedded malicious code, architectural backdoors, and deserialization vulnerabilities before deployment. Protect AI Guardian scans 35+ model formats and has scanned over 4 million models on Hugging Face. HiddenLayer ModelScanner also covers 35+ formats including PyTorch, TensorFlow, and ONNX.
OWASP Top 10 for LLM Applications
The OWASP Top 10 for LLM Applications (2025 edition) is the most widely referenced framework for LLM security risks. Here is what each risk covers and which tools help address it.
LLM01: Prompt Injection. Manipulating an LLM through crafted inputs to override instructions. Covered by Lakera, LLM Guard, Garak, Promptfoo.
LLM02: Sensitive Information Disclosure. LLMs revealing confidential data from training or context. Covered by LLM Guard (PII scanners), Lakera (data leakage detection).
LLM03: Supply Chain Vulnerabilities. Compromised model files, training data, or plugins. Covered by Protect AI Guardian, HiddenLayer (model scanning).
LLM04: Data and Model Poisoning. Tampering with training data to introduce backdoors or bias. Addressed through model scanning and data pipeline validation.
LLM05: Improper Output Handling. Treating LLM output as trusted without validation. Covered by LLM Guard (output scanners), NeMo Guardrails (output rails).
LLM06: Excessive Agency. Granting LLMs too many permissions, tools, or autonomy. Covered by Akto (agent security), NeMo Guardrails (execution rails).
LLM07: System Prompt Leakage. Extracting hidden system instructions through adversarial queries. Tested by Garak, Promptfoo, DeepTeam.
LLM08: Vector and Embedding Weaknesses. Manipulating RAG systems through poisoned embeddings or retrieval manipulation.
LLM09: Misinformation. LLMs generating false or misleading content. Tested by Garak (hallucination probes), NeMo Guardrails (fact-checking rails).
LLM10: Unbounded Consumption. Denial-of-service through resource-exhausting prompts. Addressed through rate limiting and input validation.
No single tool covers all ten risks. A practical approach combines a red teaming tool for pre-deployment testing with a guardrail tool for runtime protection and a model scanner for supply chain security.
Top AI security tools
These are the tools I would evaluate first. For full reviews, see the AI security tools page.
Red teaming (open-source)
- Garak — NVIDIA’s LLM vulnerability scanner. 37+ probe modules, 23 generator backends, CLI-first. The deepest adversarial testing coverage in open source. 6.9k GitHub stars.
- Promptfoo — LLM evaluation and red teaming CLI. 50+ vulnerability types, YAML-based configs, web UI for results. Also handles prompt comparison and regression testing. 10.3k GitHub stars.
- PyRIT — Microsoft’s AI red teaming framework. Multi-modal (text, image, audio, video), crescendo attacks, and Tree of Attacks with Pruning. Built by Microsoft’s AI Red Team. 3.4k GitHub stars.
- DeepTeam — 40+ vulnerability types mapped to OWASP Top 10 for LLMs. Linear, tree, and crescendo jailbreaking. From the Confident AI team. 1.2k GitHub stars.
Guardrails (open-source)
- LLM Guard — 15 input scanners and 20 output scanners. Prompt injection, PII anonymization, toxicity filtering. Deployable as an API server. By Protect AI. 2.5k GitHub stars.
- NeMo Guardrails — NVIDIA’s programmable guardrail toolkit. Colang DSL for safety policies, five rail types (input, dialog, retrieval, execution, output), LangChain integration. 5.6k GitHub stars.
Commercial platforms
- Lakera — Real-time prompt injection detection with 98%+ accuracy and sub-50ms latency. 100+ language support. Acquired by Check Point in 2025. Creators of the Gandalf prompt injection game.
- HiddenLayer — Enterprise ML model security. Model scanning, runtime defense, AI supply chain security, automated red teaming. 48+ CVEs disclosed, 25+ patents. Backed by Microsoft’s M12.
- Protect AI Guardian — ML model scanning gateway. 35+ model formats, 4M+ Hugging Face models scanned. Acquired by Palo Alto Networks in 2025, now part of Prisma AIRS.
- Akto — AI agent and MCP server security. 1,000+ agent exploit tests, runtime guardrails, tool poisoning detection. Launched the first MCP security solution in 2025.
Getting started
If you are deploying an LLM application and have done no adversarial testing, here is a practical path.
Run a red team scan. Pick Garak or Promptfoo and point it at your LLM endpoint. Both are free, install in minutes, and will immediately show you how your application handles prompt injection, jailbreaks, and data leakage attempts. Expect to be surprised by the results.
Add input/output guardrails. Deploy LLM Guard or NeMo Guardrails as a middleware layer between users and your model. Filter prompt injection attempts before they reach the LLM. Scan outputs for PII leakage, toxicity, and hallucinated content before they reach the user.
Review the OWASP Top 10 for LLMs. Walk through each risk and assess whether your application is exposed. Pay particular attention to prompt injection (LLM01), sensitive information disclosure (LLM02), and excessive agency (LLM06) if your LLM has access to tools or APIs.
Scan your model supply chain. If you download models from Hugging Face or other registries, scan them before loading. Protect AI Guardian and HiddenLayer detect deserialization attacks and backdoors in model files. A malicious pickle file can execute arbitrary code the moment you load it.
Test continuously. AI security is not a one-time audit. Models get updated, prompts change, and new attack techniques emerge regularly. Run red team scans in CI and monitor guardrail metrics in production.
Consider the full stack. AI security does not replace traditional AppSec. Your LLM application still needs SAST for code vulnerabilities, DAST for runtime issues, and SCA for dependency risks. AI security is an additional layer, not a replacement.
FAQ
This guide is part of our API & AI Security resource hub.
Frequently Asked Questions
What is AI security in simple terms?
What is prompt injection?
What is AI red teaming?
What is the difference between AI guardrails and AI red teaming?
Does the OWASP Top 10 for LLMs apply to my project?
Are free AI security tools good enough?
How is AI security different from traditional application security?

Suphi Cankurt is an application security enthusiast based in Helsinki, Finland. He reviews and compares 129 AppSec tools across 10 categories on AppSec Santa. Learn more.
Comments
Powered by Giscus — comments are stored in GitHub Discussions.