Skip to content
Home AI Security Tools NVIDIA NeMo Guardrails
NVIDIA NeMo Guardrails

NVIDIA NeMo Guardrails

Category: AI Security
License: Free (Open-Source)
Suphi Cankurt
Suphi Cankurt
AppSec Enthusiast
Updated February 17, 2026
4 min read
Key Takeaways
  • NVIDIA's open-source toolkit (Apache 2.0, 5.6k GitHub stars) for adding programmable guardrails to LLM applications with 5 rail types: input, dialog, retrieval, execution, and output.
  • Colang domain-specific language enables declarative multi-turn dialog flow control—unique among guardrail tools that typically only filter individual inputs and outputs.
  • Supports OpenAI, Azure, Anthropic, HuggingFace, and NVIDIA NIM models; integrates with LangChain, LangGraph, and custom chains.
  • Includes jailbreak detection, prompt injection protection, fact-checking against knowledge bases, and hallucination detection with OpenTelemetry tracing.

NVIDIA NeMo Guardrails is an open-source AI security toolkit for adding programmable guardrails to LLM-based conversational applications. It has 5.6k GitHub stars and 597 forks.

NeMo Guardrails architecture showing how programmable guardrails intercept LLM inputs and outputs

The project is Apache 2.0 licensed and maintained by NVIDIA. The latest release is v0.20.0 (January 2026). It requires Python 3.10-3.13. The framework was published as a research paper at EMNLP 2023 (arXiv:2310.10501).

What is NeMo Guardrails?

NeMo Guardrails intercepts the traffic between your application and its LLM, applying configurable safety checks that block or modify content based on defined policies. The main difference from other guardrail tools is dialog management. Most solutions only filter individual inputs and outputs. NeMo Guardrails models entire conversation flows.

The toolkit uses Colang, a domain-specific language purpose-built for defining conversational guardrails. Colang lets you declaratively specify what conversations should look like, which topics are allowed, and how the system should respond to specific user intents. Both Colang 1.0 and the newer Colang 2.0 are supported.

Five rail types cover different stages of the LLM interaction: input rails process user messages, dialog rails control conversation flow, retrieval rails filter knowledge base results, execution rails gate tool/action calls, and output rails validate responses.

Colang Language
A Python-like domain-specific language for defining dialog flows and guardrail policies. Supports both v1.0 and v2.0 syntax with flows, events, actions, and LLM integration.
Five Rail Types
Input, dialog, retrieval, execution, and output rails cover every stage of the LLM interaction. Dialog rails are unique to NeMo Guardrails, controlling how conversations flow across turns.
Provider Support
Works with OpenAI, Azure, Anthropic, HuggingFace, NVIDIA NIM, and other LLM providers. Integrates with LangChain, LangGraph, and third-party security services.

Key Features

FeatureDetails
Rail TypesInput, dialog, retrieval, execution, output
Colang Versions1.0 (default) and 2.0
Input RailsJailbreak detection, prompt injection, content moderation, intent classification
Output RailsFact-checking, hallucination detection, sensitive data blocking, response validation
Dialog RailsTopic boundaries, conversation flows, canonical forms, branching logic
LLM ProvidersOpenAI, Azure, Anthropic, HuggingFace, NVIDIA NIM, LLaMA, Falcon, Vicuna, Mosaic
FrameworksLangChain, LangGraph, custom chains
DeploymentPython API, FastAPI server, Docker, NeMo Microservice
ObservabilityOpenTelemetry tracing, structured logging, performance evaluation
LicenseApache 2.0

Input Rails

Input rails process user messages before they reach the LLM. They handle jailbreak detection, prompt injection filtering, content moderation, and intent classification. You can use NVIDIA’s built-in safety models or plug in third-party providers.

Dialog Rails

Dialog rails are what make NeMo Guardrails different from input/output-only tools. They model the intended flow of conversations using Colang, keeping the LLM on track across multiple turns. If a user tries to steer the conversation off-topic or into restricted areas, dialog rails redirect it back.

NeMo Guardrails flow diagram showing input, dialog, retrieval, execution, and output rails

Retrieval and Execution Rails

Retrieval rails filter and validate knowledge base results before they reach the LLM, reducing hallucination risk. Execution rails gate tool calls and actions, preventing the LLM from triggering operations it shouldn’t.

Output Rails

Output rails validate LLM responses before they reach the user. Built-in checks include fact-checking against knowledge bases, hallucination detection, sensitive data blocking, and response quality validation.

Third-party integrations
NeMo Guardrails supports external security providers alongside its built-in capabilities. You can also write custom rails for your own requirements.

Colang Language

Colang is purpose-built for defining conversational guardrails. It uses a Python-like syntax where you define user intents, bot responses, and flows that connect them. Colang 2.0 adds events, actions, and more direct LLM integration.

Example Colang 1.0 flow:

define user express greeting
  "Hello!"
  "Good afternoon!"

define flow
  user express greeting
  bot express greeting
  bot offer to help

Getting Started

1
Install the package — Run pip install nemoguardrails. Requires Python 3.10-3.13 and a C++ compiler (needed for the Annoy library dependency).
2
Create a config directory — Set up a config/ folder with a config.yml file specifying your LLM provider and which rails to enable (input, output, dialog).
3
Define Colang flows — Write .co files in your config directory to define user intents, bot responses, and conversation flows that enforce your safety policies.
4
Run with Python or server — Use the LLMRails Python API for direct integration, or launch the built-in FastAPI server with nemoguardrails server for HTTP access.

When to use NeMo Guardrails

NeMo Guardrails makes sense when you need control over conversational AI behavior beyond input/output filtering. The dialog rails let you model how conversations should flow, which matters for customer service bots, enterprise assistants, or any application where multi-turn conversation control is needed.

The Colang language gives you declarative control that would require complex state management code in other frameworks. If you need to keep conversations on topic, implement branching dialog policies, or integrate fact-checking against knowledge bases, NeMo Guardrails handles these natively.

Organizations using NVIDIA’s AI ecosystem benefit from native compatibility with NVIDIA NIM models and NVIDIA AI Enterprise.

Best for
Teams building conversational AI applications that need dialog flow control, topic boundaries, and multi-turn safety policies in addition to input/output filtering.

For a broader overview of LLM security risks, see our AI security guide. For input/output-only scanning without dialog management, LLM Guard is lighter weight. For adversarial testing and red teaming, look at Garak, PyRIT, or Promptfoo. Lakera Guard provides guardrails as a managed cloud API.

Frequently Asked Questions

What is NeMo Guardrails?
NeMo Guardrails is an open-source toolkit by NVIDIA for adding programmable guardrails to LLM-based conversational applications. It has 5.6k GitHub stars and uses the Colang domain-specific language for defining safety policies and dialog flows.
Is NeMo Guardrails free to use?
Yes, NeMo Guardrails is free and open-source under the Apache 2.0 license. It requires Python 3.10-3.13 and integrates with OpenAI, Azure, Anthropic, HuggingFace, and NVIDIA NIM models.
Does NeMo Guardrails protect against prompt injection?
Yes, NeMo Guardrails includes input rails for jailbreak detection and prompt injection protection. It also provides dialog rails that control conversation flow, preventing the LLM from being steered into unwanted territory.
What makes NeMo Guardrails different from other LLM safety tools?
NeMo Guardrails is one of the few open-source toolkits that combines safety controls with dialog management through its Colang language. It offers five rail types (input, dialog, retrieval, execution, output) and supports both Colang 1.0 and 2.0 syntax.