Skip to content
Home AI Security Tools Adversarial Robustness Toolbox (ART)
Adversarial Robustness Toolbox (ART)

Adversarial Robustness Toolbox (ART)

Category: AI Security
License: Free (Open-Source, MIT)
Suphi Cankurt
Suphi Cankurt
AppSec Enthusiast
Updated March 18, 2026
6 min read
Key Takeaways
  • ART is IBM Research's open-source Python library for machine learning security, now a graduated project under the Linux Foundation AI & Data Foundation.
  • Covers four threat categories — evasion, poisoning, extraction, and inference — with 55+ attack implementations and 30+ defense methods.
  • Supports all major ML frameworks including TensorFlow, Keras, PyTorch, scikit-learn, XGBoost, LightGBM, and CatBoost across images, tables, audio, and video.
  • 5.9k GitHub stars, MIT license, and active development with v1.20.1 (July 2025) adding YOLO v8+ support and GREAT Score for generative robustness evaluation.

Adversarial Robustness Toolbox (ART) is an open-source Python library from IBM Research for testing and improving the security of machine learning models against adversarial attacks. ART implements 55+ attack methods and 30+ defense mechanisms across four threat categories: evasion, poisoning, extraction, and inference. It has 5,900+ GitHub stars and is listed in the AI security category.

IBM started the project in 2018 and donated it to the Linux Foundation AI & Data Foundation (LF AI & Data), where it became a graduated project — the highest maturity level in the foundation. The current version is 1.20.1, released in July 2025.

ART is different from most AI security tools because it focuses on traditional ML model robustness rather than LLM-specific threats. If you work with image classifiers, object detectors, speech recognition models, or tabular ML pipelines, ART is the go-to library for testing adversarial resilience.

ART adversarial threat model showing four attack categories: evasion, poisoning, extraction, and inference against ML models

Overview

ART addresses four categories of adversarial threats against machine learning systems:

  • Evasion — Modifying inputs at inference time to cause misclassification (adversarial examples)
  • Poisoning — Manipulating training data to compromise model behavior
  • Extraction — Stealing a model’s functionality through queries
  • Inference — Attacking the privacy of training data (membership inference, model inversion)

The library implements 55+ attack methods and 30+ defense mechanisms across these categories. I tested ART against several PyTorch image classifiers, and the API is consistent regardless of which framework or attack type you use — you wrap your model, pick an attack or defense, and run it.

ART supports all popular ML frameworks: TensorFlow, Keras, PyTorch, scikit-learn, XGBoost, LightGBM, CatBoost, and GPy. It handles images, tabular data, audio, and video inputs, covering tasks from classification and object detection to speech recognition and generative models.

ART four threat categories — evasion, poisoning, inference, and extraction — mapped as puzzle pieces

55+ Attack Methods
Implements FGSM, PGD, C&W, DeepFool, AutoAttack for evasion; backdoor and clean-label for poisoning; CopycatCNN and KnockoffNets for extraction; membership and attribute inference for privacy attacks.
30+ Defense Mechanisms
Preprocessors (spatial smoothing, JPEG compression, feature squeezing), postprocessors, adversarial training, certified defenses (randomized smoothing), and detection methods.
Multi-Framework Support
Works with TensorFlow, Keras, PyTorch, scikit-learn, XGBoost, LightGBM, CatBoost, and GPy. BlackBoxClassifier wrapper for prediction-API-only access.

Key Features

FeatureDetails
Current version1.20.1 (July 2025)
GitHub stats5.9k stars, MIT license
Attack categoriesEvasion, poisoning, extraction, inference
Defense modulesPreprocessor, postprocessor, trainer, transformer, detector
ML frameworksTensorFlow, Keras, PyTorch, scikit-learn, XGBoost, LightGBM, CatBoost, GPy
Data typesImages, tables, audio, video
ML tasksClassification, object detection, speech recognition, generation, certification
Installationpip, Docker
Python support3.10+
GovernanceLF AI & Data Foundation (graduated project)

Evasion attacks

Evasion attacks modify inputs at inference time to fool a trained model. ART implements the most widely studied attack methods in adversarial ML research:

  • FGSM (Fast Gradient Sign Method) — Single-step gradient-based perturbation. Fast but less effective against robust models.
  • PGD (Projected Gradient Descent) — Iterative version of FGSM. The standard benchmark for adversarial robustness evaluation.
  • C&W (Carlini & Wagner) — Optimization-based attack that finds minimal perturbations. Slower but more effective.
  • DeepFool — Computes the minimal perturbation to cross the decision boundary.
  • AutoAttack — Ensemble of attacks for reliable robustness evaluation.

These attacks work across image classifiers, object detectors, and speech recognition models. ART handles the framework-specific gradient computation behind the scenes.

Poisoning attacks

Poisoning attacks target the training phase. ART supports backdoor attacks (inserting triggers into training data) and clean-label attacks (subtly modifying legitimate training examples to shift model behavior).

These are harder to detect because the training data may look normal to human reviewers. ART’s poisoning defense modules help detect and mitigate these attacks before they compromise a production model.

Extraction and inference attacks

Model extraction attacks like CopycatCNN and KnockoffNets reconstruct a target model’s behavior using only query access. This threatens proprietary models served via APIs — an attacker can clone the model’s functionality without access to weights or training data.

Inference attacks target data privacy. Membership inference determines whether a specific data point was in the training set. Model inversion reconstructs training examples from model outputs. Attribute inference infers sensitive attributes about training data subjects.

Defense mechanisms

ART red and blue team capabilities showing attack evaluation methods and corresponding defense techniques

ART’s defenses fall into five categories:

  • Preprocessors — Transform inputs before they reach the model (spatial smoothing, JPEG compression, feature squeezing, thermometer encoding)
  • Postprocessors — Modify model outputs to reduce information leakage
  • Trainers — Adversarial training methods that incorporate adversarial examples during model training
  • Transformers — Model modification techniques for robustness
  • Detectors — Identify adversarial inputs at inference time
Research-grade implementation
ART implements attacks and defenses from peer-reviewed ML security papers. The library includes over 100 Jupyter notebooks demonstrating individual techniques, making it useful for both research and practical security testing.

GREAT Score and YOLO support

Version 1.20.0 introduced the GREAT Score (Global Robustness Evaluation of Adversarial Perturbation using Generative Models), which uses generative AI to measure model robustness without requiring manual perturbation design. The same release added support for YOLO v8+ object detection models.

Supported Frameworks

ML Frameworks
PyTorch PyTorch
TensorFlow TensorFlow
Keras Keras
scikit-learn scikit-learn
XGBoost XGBoost
LightGBM LightGBM

Getting Started

1
Install ART — Run pip install adversarial-robustness-toolbox. For specific frameworks, install their dependencies separately (e.g., pip install torch torchvision for PyTorch).
2
Wrap your model — Use art.estimators.classification.PyTorchClassifier (or TensorFlowV2Classifier, SklearnClassifier, etc.) to wrap your trained model. ART needs the wrapper to compute gradients and manage inputs.
3
Run an attack — Import an attack (e.g., art.attacks.evasion.FastGradientMethod), configure parameters like epsilon, and call attack.generate(x_test) to create adversarial examples.
4
Apply a defense — Import a defense (e.g., art.defences.preprocessor.SpatialSmoothing) and wrap it around your estimator. Test the defended model against the same attacks to measure robustness improvement.

Basic usage example

from art.estimators.classification import PyTorchClassifier
from art.attacks.evasion import FastGradientMethod
from art.defences.preprocessor import SpatialSmoothing

# Wrap your PyTorch model
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=optimizer,
    input_shape=(3, 32, 32),
    nb_classes=10
)

# Generate adversarial examples
attack = FastGradientMethod(estimator=classifier, eps=0.3)
x_adversarial = attack.generate(x=x_test)

# Evaluate accuracy drop
predictions = classifier.predict(x_adversarial)

Docker support

ART provides Docker images for reproducible environments:

docker pull adversarialrobustnesstoolbox/art:latest
docker run -it adversarialrobustnesstoolbox/art:latest

Use Cases

  • Model robustness testing — Run standardized attack suites (PGD, AutoAttack) against image classifiers before deployment to quantify adversarial vulnerability.
  • Adversarial training — Use ART’s trainer module to augment training data with adversarial examples, producing models that resist perturbation attacks.
  • Privacy assessment — Run membership inference and model inversion attacks to evaluate whether a model leaks sensitive information about its training data.
  • Supply chain model security — Test third-party or pre-trained models for backdoor poisoning before integrating them into production pipelines.
  • Research and benchmarking — Reproduce published adversarial ML results and compare new defense methods against standard attack benchmarks.

Strengths & Limitations

Strengths:

  • Widest coverage of adversarial ML techniques in a single library (55+ attacks, 30+ defenses)
  • Framework-agnostic design supports all major ML libraries through a unified API
  • Graduated LF AI & Data project with IBM Research backing and active maintenance
  • Extensive Jupyter notebook collection (100+) for learning and experimentation
  • Covers the full adversarial ML workflow: attacks, defenses, detection, and certification

Limitations:

  • Focused on traditional ML models — limited coverage for LLM-specific threats like prompt injection or jailbreaks
  • Some advanced attacks have high computational requirements, especially on large models
  • Documentation, while extensive, can be fragmented across the wiki, ReadTheDocs, and notebooks
  • Requires ML framework expertise to configure model wrappers correctly
Best for
ML engineers and security researchers who need to evaluate and improve the adversarial robustness of computer vision, speech, or tabular machine learning models using standardized attack and defense methods.

For LLM-specific security testing (prompt injection, jailbreaks, hallucinations), look at Garak for probe-based scanning, DeepTeam for OWASP-mapped red teaming, or Giskard for combined LLM and ML model testing. For runtime guardrails on language models, consider NeMo Guardrails or LLM Guard.

For a broader overview of the AI security landscape, see our AI security tools guide.

Frequently Asked Questions

What is the Adversarial Robustness Toolbox (ART)?
ART is an open-source Python library created by IBM Research for machine learning security. It provides 55+ adversarial attack methods and 30+ defense implementations across four threat categories: evasion, poisoning, extraction, and inference. The project is a graduated member of the Linux Foundation AI & Data Foundation with 5.9k GitHub stars.
Is ART free to use?
Yes. ART is completely free and open-source under the MIT license. You can use it in commercial and research projects without restrictions. It installs via pip and has no paid tiers or cloud dependencies.
Which ML frameworks does ART support?
ART supports TensorFlow, Keras, PyTorch, scikit-learn, XGBoost, LightGBM, CatBoost, and GPy. It also provides a BlackBoxClassifier for models accessible only via prediction APIs. Framework-specific wrappers handle the differences automatically.
How does ART compare to Garak or DeepTeam?
ART focuses on traditional ML model robustness — adversarial perturbations on images, audio, and tabular data. Garak and DeepTeam are LLM-specific tools that test for prompt injection, jailbreaks, and hallucinations. ART is the right choice when securing computer vision, speech, or tabular ML models rather than language models.
What types of adversarial attacks does ART implement?
ART implements evasion attacks (FGSM, PGD, C&W, DeepFool, AutoAttack), poisoning attacks (backdoor, clean-label), model extraction attacks (CopycatCNN, KnockoffNets), and inference attacks (membership inference, model inversion, attribute inference). Each category has multiple variants.