Skip to content
Home API & AI Security What is AI Security?
Guide

What is AI Security?

Learn how AI security tools protect LLM applications from prompt injection, jailbreaks, and model attacks. Covers OWASP Top 10 for LLMs, AI red teaming, guardrails, and practical advice for securing AI systems.

Suphi Cankurt
Suphi Cankurt
AppSec Enthusiast
Updated February 10, 2026
10 min read
0 Comments

What AI security means

AI security is the practice of protecting artificial intelligence systems from adversarial attacks, misuse, and unintended behavior. It covers everything from preventing prompt injection in chatbots to scanning ML model files for malicious payloads before deployment.

This is not traditional application security with a new label. The attack surface is fundamentally different. In a standard web application, an attacker exploits bugs in deterministic code. In an AI system, the attacker exploits the model’s reasoning. The same input can produce different outputs. There is no source code to patch when a jailbreak works.

The field has split into two broad areas. LLM security focuses on large language model applications: prompt injection, jailbreaking, data leakage, and hallucination. ML model security focuses on the models themselves: adversarial examples, data poisoning, model theft, and supply chain attacks on model files. Most teams building LLM-powered products need to worry about both.

There are currently 10 active tools in the AI security tools category on this site, and the space is growing faster than any other AppSec category I track.


Why AI security matters now

Three things happened at once: organizations rushed to deploy LLMs, attackers figured out how to exploit them, and regulators started paying attention.

LLM adoption outpaced security

By 2025, most Fortune 500 companies had deployed at least one LLM-powered application. Customer-facing chatbots, internal knowledge assistants, code generation tools, automated support agents. The deployment speed was remarkable. The security testing was often an afterthought.

The result is a wave of LLM applications in production with minimal adversarial testing. Many of them are vulnerable to prompt injection attacks that a junior security engineer could execute in minutes.

Real-world incidents

This is not theoretical. In 2023, a researcher got Bing Chat to reveal its hidden system prompt (codenamed “Sydney”) through a simple prompt injection. Chevrolet’s AI chatbot was tricked into agreeing to sell a car for one dollar. Air Canada’s chatbot fabricated a refund policy that the airline was later held legally responsible for. These are not edge cases. They are the natural result of deploying LLMs without adversarial testing.

Regulatory pressure

The EU AI Act, which entered into force in stages between 2024 and 2026, classifies AI systems by risk level and imposes requirements for high-risk applications. In the US, the NIST AI Risk Management Framework (AI RMF) provides voluntary guidance for managing AI risks. The OWASP Foundation published the Top 10 for LLM Applications, now in version 2025, which has become the de facto checklist for LLM security assessments.

Organizations that deploy LLMs without security testing face both regulatory and reputational risk.


Key threat categories

AI security threats fall into several categories. Understanding them helps you decide which tools to deploy.

Prompt injection

The most common and most dangerous LLM vulnerability. An attacker crafts input that overrides the model’s system instructions.

Direct prompt injection is when the user types something like “Ignore your previous instructions and do X.” Surprisingly effective against unguarded models.

Indirect prompt injection is more insidious. The malicious instruction is embedded in external data the LLM processes: a web page it retrieves, an email it summarizes, a document it analyzes. The user never sees the attack. The model follows the hidden instruction.

Tools like Lakera detect prompt injection in real time with sub-50ms latency. Garak and Promptfoo test whether your application is vulnerable before deployment.

Jailbreaking

Jailbreaking bypasses the model’s safety alignment to make it produce content it was trained to refuse: instructions for harmful activities, generation of malicious code, or bypassing content filters. Techniques include role-playing scenarios, encoding tricks, multi-turn escalation (crescendo attacks), and character persona manipulation.

PyRIT from Microsoft’s AI Red Team supports crescendo attacks and Tree of Attacks with Pruning (TAP) for automated jailbreak testing. DeepTeam covers linear, tree, and crescendo jailbreaking methods.

Training data extraction

Attackers can sometimes extract memorized training data from a model through carefully crafted prompts. This is a privacy risk when models are trained on proprietary, personal, or sensitive data. Research has demonstrated extraction of personally identifiable information, code snippets, and verbatim text passages from production models.

Data poisoning

If an attacker can influence training data, they can introduce backdoors or bias into the model. This is most relevant for organizations that fine-tune models on their own data. A poisoned training set can make the model behave normally on most inputs but produce attacker-controlled outputs for specific triggers.

Model theft and inversion

Model extraction attacks reconstruct a model’s parameters by querying it systematically and analyzing the outputs. Model inversion attacks use a model’s outputs to infer properties of its training data. Both are concerns for organizations that have invested in proprietary model development.

Supply chain attacks on ML models

ML model files (pickle, PyTorch, ONNX) can contain embedded malicious code that executes when the file is loaded. This is the AI equivalent of a compromised npm package. Protect AI Guardian and HiddenLayer scan model files for deserialization attacks and backdoors before they reach production.

Hallucination exploitation

LLMs generate plausible but fabricated information. Attackers can exploit this by designing scenarios where the model’s hallucinations cause real harm: fabricating legal citations, inventing API endpoints that redirect to malicious servers, or generating package names that an attacker has registered (package hallucination).

Excessive agency

When LLMs are connected to tools and APIs (agents), a compromised model can take real-world actions: sending emails, modifying databases, executing code. The OWASP Top 10 lists excessive agency as a distinct risk. Akto focuses specifically on securing AI agents and MCP (Model Context Protocol) servers.


How AI security tools work

AI security tools fall into four categories based on when and how they operate.

Red teaming and probing

These tools test your LLM application before deployment by sending adversarial inputs and evaluating the outputs. Think of them as penetration testing tools for AI.

Garak is NVIDIA’s open-source LLM vulnerability scanner with 37+ probe modules covering prompt injection, jailbreaks, data leakage, and hallucination. Promptfoo is an evaluation and red teaming CLI with 50+ vulnerability types, used by 300,000+ developers. PyRIT is Microsoft’s framework for multi-modal AI red teaming across text, image, audio, and video. DeepTeam covers 40+ vulnerability types mapped to the OWASP Top 10 for LLMs.

For a detailed comparison of the two most popular open-source options, see Garak vs Promptfoo. For a practical guide to building and running red team exercises against LLM applications, see our LLM red teaming guide.

Guardrails and filters

These tools sit between the user and the model in production, filtering inputs before they reach the LLM and scanning outputs before they reach the user.

Lakera provides a real-time API with 98%+ prompt injection detection across 100+ languages. LLM Guard offers 15 input scanners and 20 output scanners as an open-source Python library. NeMo Guardrails uses a domain-specific language (Colang) to define programmable safety policies and dialog flows.

Runtime monitoring

Runtime tools watch AI systems in production and detect anomalous behavior, adversarial inputs, and policy violations in real time. HiddenLayer provides runtime defense for enterprise ML models without requiring access to model weights. Akto monitors AI agent workflows and MCP servers for tool poisoning and unauthorized actions.

Model scanning

These tools analyze ML model files for embedded malicious code, architectural backdoors, and deserialization vulnerabilities before deployment. Protect AI Guardian scans 35+ model formats and has scanned over 4 million models on Hugging Face. HiddenLayer ModelScanner also covers 35+ formats including PyTorch, TensorFlow, and ONNX.


OWASP Top 10 for LLM Applications

The OWASP Top 10 for LLM Applications (2025 edition) is the most widely referenced framework for LLM security risks. Here is what each risk covers and which tools help address it.

LLM01: Prompt Injection. Manipulating an LLM through crafted inputs to override instructions. Covered by Lakera, LLM Guard, Garak, Promptfoo.

LLM02: Sensitive Information Disclosure. LLMs revealing confidential data from training or context. Covered by LLM Guard (PII scanners), Lakera (data leakage detection).

LLM03: Supply Chain Vulnerabilities. Compromised model files, training data, or plugins. Covered by Protect AI Guardian, HiddenLayer (model scanning).

LLM04: Data and Model Poisoning. Tampering with training data to introduce backdoors or bias. Addressed through model scanning and data pipeline validation.

LLM05: Improper Output Handling. Treating LLM output as trusted without validation. Covered by LLM Guard (output scanners), NeMo Guardrails (output rails).

LLM06: Excessive Agency. Granting LLMs too many permissions, tools, or autonomy. Covered by Akto (agent security), NeMo Guardrails (execution rails).

LLM07: System Prompt Leakage. Extracting hidden system instructions through adversarial queries. Tested by Garak, Promptfoo, DeepTeam.

LLM08: Vector and Embedding Weaknesses. Manipulating RAG systems through poisoned embeddings or retrieval manipulation.

LLM09: Misinformation. LLMs generating false or misleading content. Tested by Garak (hallucination probes), NeMo Guardrails (fact-checking rails).

LLM10: Unbounded Consumption. Denial-of-service through resource-exhausting prompts. Addressed through rate limiting and input validation.

No single tool covers all ten risks. A practical approach combines a red teaming tool for pre-deployment testing with a guardrail tool for runtime protection and a model scanner for supply chain security.


Top AI security tools

These are the tools I would evaluate first. For full reviews, see the AI security tools page.

Red teaming (open-source)

  • Garak — NVIDIA’s LLM vulnerability scanner. 37+ probe modules, 23 generator backends, CLI-first. The deepest adversarial testing coverage in open source. 6.9k GitHub stars.
  • Promptfoo — LLM evaluation and red teaming CLI. 50+ vulnerability types, YAML-based configs, web UI for results. Also handles prompt comparison and regression testing. 10.3k GitHub stars.
  • PyRIT — Microsoft’s AI red teaming framework. Multi-modal (text, image, audio, video), crescendo attacks, and Tree of Attacks with Pruning. Built by Microsoft’s AI Red Team. 3.4k GitHub stars.
  • DeepTeam — 40+ vulnerability types mapped to OWASP Top 10 for LLMs. Linear, tree, and crescendo jailbreaking. From the Confident AI team. 1.2k GitHub stars.

Guardrails (open-source)

  • LLM Guard — 15 input scanners and 20 output scanners. Prompt injection, PII anonymization, toxicity filtering. Deployable as an API server. By Protect AI. 2.5k GitHub stars.
  • NeMo Guardrails — NVIDIA’s programmable guardrail toolkit. Colang DSL for safety policies, five rail types (input, dialog, retrieval, execution, output), LangChain integration. 5.6k GitHub stars.

Commercial platforms

  • Lakera — Real-time prompt injection detection with 98%+ accuracy and sub-50ms latency. 100+ language support. Acquired by Check Point in 2025. Creators of the Gandalf prompt injection game.
  • HiddenLayer — Enterprise ML model security. Model scanning, runtime defense, AI supply chain security, automated red teaming. 48+ CVEs disclosed, 25+ patents. Backed by Microsoft’s M12.
  • Protect AI Guardian — ML model scanning gateway. 35+ model formats, 4M+ Hugging Face models scanned. Acquired by Palo Alto Networks in 2025, now part of Prisma AIRS.
  • Akto — AI agent and MCP server security. 1,000+ agent exploit tests, runtime guardrails, tool poisoning detection. Launched the first MCP security solution in 2025.

Getting started

If you are deploying an LLM application and have done no adversarial testing, here is a practical path.

Run a red team scan. Pick Garak or Promptfoo and point it at your LLM endpoint. Both are free, install in minutes, and will immediately show you how your application handles prompt injection, jailbreaks, and data leakage attempts. Expect to be surprised by the results.

Add input/output guardrails. Deploy LLM Guard or NeMo Guardrails as a middleware layer between users and your model. Filter prompt injection attempts before they reach the LLM. Scan outputs for PII leakage, toxicity, and hallucinated content before they reach the user.

Review the OWASP Top 10 for LLMs. Walk through each risk and assess whether your application is exposed. Pay particular attention to prompt injection (LLM01), sensitive information disclosure (LLM02), and excessive agency (LLM06) if your LLM has access to tools or APIs.

Scan your model supply chain. If you download models from Hugging Face or other registries, scan them before loading. Protect AI Guardian and HiddenLayer detect deserialization attacks and backdoors in model files. A malicious pickle file can execute arbitrary code the moment you load it.

Test continuously. AI security is not a one-time audit. Models get updated, prompts change, and new attack techniques emerge regularly. Run red team scans in CI and monitor guardrail metrics in production.

Consider the full stack. AI security does not replace traditional AppSec. Your LLM application still needs SAST for code vulnerabilities, DAST for runtime issues, and SCA for dependency risks. AI security is an additional layer, not a replacement.


FAQ

This guide is part of our API & AI Security resource hub.

Frequently Asked Questions

What is AI security in simple terms?
AI security is the practice of protecting AI and machine learning systems from attacks. This includes preventing prompt injection in LLM applications, scanning ML models for malicious payloads, red teaming AI systems to find weaknesses, and adding guardrails that filter harmful inputs and outputs. It is a distinct discipline from traditional application security because the attack surface is fundamentally different.
What is prompt injection?
Prompt injection is an attack where a user crafts input that overrides an LLM’s system instructions. In direct prompt injection, the attacker types a malicious prompt. In indirect prompt injection, the malicious instruction is hidden in external data the LLM processes, such as a web page or email. It is the most common vulnerability in LLM applications and is listed as the number one risk in the OWASP Top 10 for LLM Applications.
What is AI red teaming?
AI red teaming is the practice of systematically probing AI systems to find security vulnerabilities, safety failures, and harmful behaviors. Tools like Garak, Promptfoo, PyRIT, and DeepTeam automate this by sending adversarial inputs to LLMs and evaluating the responses. It is analogous to penetration testing for traditional applications.
What is the difference between AI guardrails and AI red teaming?
Red teaming finds vulnerabilities before deployment by probing the model with adversarial inputs. Guardrails prevent exploitation in production by filtering inputs and outputs in real time. You need both. Red teaming tools like Garak and PyRIT test your defenses. Guardrail tools like Lakera Guard, LLM Guard, and NeMo Guardrails enforce them.
Does the OWASP Top 10 for LLMs apply to my project?
If your application uses a large language model in any capacity, yes. The OWASP Top 10 for LLM Applications covers risks like prompt injection, insecure output handling, training data poisoning, and excessive agency. Even if you use a hosted model through an API, your application is exposed to most of these risks.
Are free AI security tools good enough?
For many teams, yes. Garak, Promptfoo, LLM Guard, NeMo Guardrails, PyRIT, and DeepTeam are all free and open-source. They cover red teaming, guardrails, and input/output scanning. Commercial tools like Lakera Guard, HiddenLayer, and Protect AI Guardian add enterprise features, faster detection, and dedicated support. Start with open-source tools and upgrade when you need lower latency or compliance features.
How is AI security different from traditional application security?
Traditional AppSec deals with deterministic software where the same input produces the same output. AI systems are probabilistic. The same prompt can produce different responses. Attacks exploit the model’s reasoning rather than code-level bugs. You cannot patch a prompt injection the way you patch a SQL injection. AI security requires a different toolset: red teaming frameworks, runtime guardrails, and model-level scanning.
Suphi Cankurt
Written by
Suphi Cankurt

Suphi Cankurt is an application security enthusiast based in Helsinki, Finland. He reviews and compares 129 AppSec tools across 10 categories on AppSec Santa. Learn more.

Comments

Powered by Giscus — comments are stored in GitHub Discussions.