Skip to content
Arize AI

Arize AI

NEW
Category: AI Security
License: Free (Open-Source) and Commercial
Suphi Cankurt
Suphi Cankurt
AppSec Enthusiast
Updated April 3, 2026
6 min read
Key Takeaways
  • AI observability platform with two products: Phoenix (open-source, 9.1k+ GitHub stars, Elastic License 2.0) for development and AX for enterprise production monitoring.
  • OpenTelemetry-based architecture provides vendor-agnostic LLM tracing — full flow logging of agent interactions, tool calls, and configurations with open standards.
  • Processes 1 trillion spans and runs 50+ million evaluations monthly across customers including DoorDash, Instacart, Reddit, Uber, and Booking.
  • Phoenix is fully self-hostable with zero feature gates — runs locally, in Jupyter notebooks, Docker containers, or cloud environments.
  • Open-source evaluation models (not proprietary black-box evaluators) with support for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ LLM providers.

Arize AI is an AI observability and LLM evaluation platform built on OpenTelemetry standards, offering vendor-agnostic tracing from development through production. The company provides Phoenix (open-source, 9.1k+ GitHub stars) for LLM tracing and evaluation, and AX for enterprise-scale AI monitoring. Arize processes 1 trillion spans and runs 50+ million evaluations monthly across customers including DoorDash, Instacart, Reddit, and Uber. It is listed in the AI security category.

The platform is used by organizations including DoorDash, Instacart, Reddit, Uber, Booking.com, Roblox, PagerDuty, Air Canada, Cohere, Conde Nast, Flipkart, TripAdvisor, Siemens, Microsoft, and Priceline to monitor and evaluate their AI applications. Arize’s approach centers on open standards — OpenTelemetry for instrumentation, open-source evaluation models rather than proprietary black-box evaluators, and standard data formats that prevent vendor lock-in.

What is Arize AI?

Arize operates across two layers of the AI stack: development-time experimentation (Phoenix) and production-scale monitoring (AX). This split lets teams start with the free, open-source Phoenix for prototyping and evaluation, then move to AX when they need enterprise-grade scale and collaboration.

OpenTelemetry is central to Arize’s architecture. LLM application traces use the same standard as traditional application performance monitoring (APM), so AI observability plugs into existing DevOps infrastructure instead of requiring a separate monitoring stack.

Phoenix (Open-Source)
Fully self-hostable AI observability with zero feature gates. LLM tracing, evaluation, experiment tracking, prompt management, and dataset versioning. 9.1k+ GitHub stars, runs locally, in Jupyter, Docker, or cloud. Elastic License 2.0.
Arize AX (Enterprise)
Production-scale AI monitoring with two editions: AX-Generative for LLM and generative AI, AX-ML & CV for traditional machine learning and computer vision. Built on adb, a purpose-built datastore for real-time ingestion and sub-second queries.
OpenTelemetry Native
Vendor-agnostic instrumentation built on the OpenTelemetry standard. Traces export to any compatible backend, preventing lock-in. Same format as traditional APM, so AI observability integrates with existing DevOps tooling.

Key Features

FeatureDetails
LLM TracingFull-flow logging of agent interactions, tool calls, retrieval steps, and configurations
Evaluation50M+ evaluations monthly; open-source evaluation models, not proprietary black-box
Agent MonitoringTrack agent behavior, tool usage, decision chains, and performance in production
Experiment TrackingCompare prompt variations, model changes, and parameter adjustments side-by-side
Prompt ManagementVersion control for prompts with systematic testing and rollback
DatasetsVersioned datasets for evaluation, experimentation, and fine-tuning
RAG EvaluationMeasure retrieval quality, relevance, and response grounding
Scale1 trillion spans processed, 5 million downloads monthly
Datastoreadb — purpose-built for generative AI with real-time ingestion and sub-second queries
Framework SupportOpenAI Agents SDK, Claude Agent SDK, LangGraph, Vercel AI SDK, CrewAI, LlamaIndex, DSPy, Haystack, Guardrails, Instructor, Pydantic AI, AutoGen AgentChat, Portkey, Google ADK, and 15+ more
LicenseElastic License 2.0 (Phoenix); commercial (AX)

Phoenix: Open-Source AI Observability

Phoenix is the open-source core of Arize’s platform. It has the same tracing, evaluation, and experimentation capabilities as the enterprise platform, with no feature gates or restrictions on the self-hosted version.

Key capabilities include:

  • LLM Tracing — Captures the full execution flow of LLM applications: each prompt, response, tool call, retrieval step, and agent decision. Traces follow the OpenTelemetry standard, making them compatible with existing observability infrastructure.
  • Evaluation — Run evaluations using open-source models to measure response quality, relevance, hallucination rates, toxicity, and other metrics. Evaluations can run in batch (for dataset-level assessment) or continuously (for production monitoring).
  • Experiment Tracking — Compare different prompt templates, model versions, temperature settings, and other parameters side-by-side to make data-driven decisions about which configuration to deploy.
  • Prompt Management — Version control for prompts with the ability to test changes systematically before deployment.
  • Dataset Management — Create and maintain versioned datasets for evaluation benchmarks, fine-tuning, and reproducible experiments.

Phoenix runs on local machines, in Jupyter notebooks, as Docker containers, or in cloud environments. Installation is straightforward: pip install arize-phoenix or pull the Docker image from Docker Hub.

Arize AX: Enterprise Platform

AX extends Phoenix’s capabilities to production scale with two editions:

  • AX-Generative — For LLM and generative AI applications. Monitors production traffic, detects quality degradation, tracks agent behavior, and provides team collaboration features for debugging and investigation.
  • AX-ML & CV — For traditional machine learning and computer vision workloads. Extends observability beyond LLMs to cover the full spectrum of AI models.

Both editions are built on adb, Arize’s purpose-built datastore optimized for generative AI workloads. It handles real-time ingestion of trace data and provides sub-second query performance for debugging and analysis at scale.

Alyx: AI Debugging Assistant

Alyx is Arize’s AI assistant for LLM application development. It helps debug traces, spot failure patterns, and integrate domain knowledge into the development workflow. Alyx works alongside Phoenix and AX to speed up investigation and root cause analysis.

Framework Integrations

Arize provides instrumentation for major AI frameworks and SDKs:

  • Agent frameworks — OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, AutoGen AgentChat, Pydantic AI, Google ADK
  • LLM frameworks — LlamaIndex, DSPy, Vercel AI SDK, Haystack, Guardrails, Instructor, Portkey
  • LLM providers — OpenAI, Anthropic, Google GenAI, AWS Bedrock, Mistral AI, Groq, OpenRouter, LiteLLM, VertexAI, and more
  • Deployment — Kubernetes, Docker, Jupyter notebooks, local machines, cloud-native environments

The OpenInference project (also open-source from Arize) provides the OpenTelemetry instrumentation packages that connect these frameworks to Phoenix or AX.

Getting Started

1
Install Phoenix — Get the open-source platform: pip install arize-phoenix. Or pull the Docker image for containerized deployment. No license key or account required.
2
Instrument your application — Add OpenTelemetry instrumentation to your LLM application using the OpenInference packages: pip install openinference-instrumentation-openai (or the package for your framework). This captures traces automatically.
3
Launch the Phoenix UI — Start Phoenix locally: python -m phoenix.server.main serve. The web UI opens at localhost:6006 with trace visualization, evaluation tools, and experiment tracking.
4
Run evaluations — Create evaluation datasets and run benchmarks against your LLM application. Use open-source evaluation models to measure quality, relevance, hallucination, and safety metrics.
5
Scale to AX (optional) — When you need production-scale monitoring, team collaboration, or enterprise features, sign up for Arize AX at app.arize.com. AX uses the same trace format, so the transition is seamless.

When to Use Arize AI

Arize AI is built for teams that need observability across the full AI application lifecycle — from prototyping and evaluation through production monitoring. The open-source Phoenix makes it accessible to individual developers and small teams, while AX scales to enterprise deployments.

It is particularly useful when you are building agent-based applications that need full-flow tracing of decision chains, running evaluations at scale to compare models and prompts, or working in environments where vendor lock-in is a concern and OpenTelemetry compatibility matters.

Best for
AI engineering teams that need observability from development to production. Phoenix gives you a free, full-featured starting point for LLM tracing and evaluation, while AX scales to enterprise production monitoring. The OpenTelemetry foundation and open-source evaluation models avoid vendor lock-in.

How Arize AI Compares

Arize AI occupies the observability and evaluation layer of the AI security landscape. For model monitoring with bias detection and explainability across LLMs and traditional ML, see Arthur AI. For privacy-preserving data logging and drift detection (open-source tools still available after Apple acquisition), see WhyLabs.

For LLM security rather than observability — prompt injection detection, guardrails, and runtime protection — consider Lakera Guard, Prompt Security, LLM Guard, or NeMo Guardrails. For pre-deployment vulnerability scanning, see Garak, Augustus, or Promptfoo.

For a broader overview of AI security tools, see the AI security tools category page.

Frequently Asked Questions

What is Arize AI?
Arize AI is an AI observability and evaluation platform with two main products: Phoenix, an open-source tool (9.1k+ GitHub stars) for LLM tracing, evaluation, and experimentation; and AX, an enterprise platform for production-scale AI monitoring. Built on OpenTelemetry for vendor-agnostic instrumentation, Arize processes 1 trillion spans and 50+ million evaluations monthly.
Is Arize AI free?
Phoenix, Arize’s open-source platform, is fully free and self-hostable with zero feature gates or restrictions. It runs locally, in Jupyter notebooks, Docker containers, and cloud environments under the Elastic License 2.0. Arize AX, the enterprise platform, is a commercial product — sign up at app.arize.com for access.
What is the difference between Phoenix and Arize AX?
Phoenix is the open-source tool for development-time AI observability: tracing, evaluation, experimentation, and prompt management. It runs locally or self-hosted. Arize AX is the enterprise platform for production-scale monitoring, with two editions: AX-Generative for LLM and generative AI applications, and AX-ML & CV for traditional machine learning and computer vision. AX adds scale, team collaboration, and enterprise features.
How does Arize AI use OpenTelemetry?
Arize built its instrumentation on the OpenTelemetry standard, the same open standard used for application performance monitoring. This means traces from LLM applications use the same format and tooling as traditional APM traces, enabling vendor-agnostic observability. You can export Arize traces to any OpenTelemetry-compatible backend, avoiding vendor lock-in.
How does Arize AI compare to WhyLabs?
Both are AI observability platforms, but WhyLabs was acquired by Apple in January 2025 and its commercial platform has been discontinued (open-source tools remain). Arize AI is actively maintained with both an open-source tool (Phoenix) and commercial platform (AX). WhyLabs differentiated on privacy-preserving statistical profiling; Arize emphasizes OpenTelemetry-based tracing, LLM evaluation, and full-flow agent monitoring.