LogoLogo
HomeBlogGitHub
latest
latest
  • New DOCS
  • What is Evidently?
  • Get Started
    • Evidently Cloud
      • Quickstart - LLM tracing
      • Quickstart - LLM evaluations
      • Quickstart - Data and ML checks
      • Quickstart - No-code evaluations
    • Evidently OSS
      • OSS Quickstart - LLM evals
      • OSS Quickstart - Data and ML monitoring
  • Presets
    • All Presets
    • Data Drift
    • Data Quality
    • Target Drift
    • Regression Performance
    • Classification Performance
    • NoTargetPerformance
    • Text Evals
    • Recommender System
  • Tutorials and Examples
    • All Tutorials
    • Tutorial - Tracing
    • Tutorial - Reports and Tests
    • Tutorial - Data & ML Monitoring
    • Tutorial - LLM Evaluation
    • Self-host ML Monitoring
    • LLM as a judge
    • LLM Regression Testing
  • Setup
    • Installation
    • Evidently Cloud
    • Self-hosting
  • User Guide
    • 📂Projects
      • Projects overview
      • Manage Projects
    • 📶Tracing
      • Tracing overview
      • Set up tracing
    • 🔢Input data
      • Input data overview
      • Column mapping
      • Data for Classification
      • Data for Recommendations
      • Load data to pandas
    • 🚦Tests and Reports
      • Reports and Tests Overview
      • Get a Report
      • Run a Test Suite
      • Evaluate Text Data
      • Output formats
      • Generate multiple Tests or Metrics
      • Run Evidently on Spark
    • 📊Evaluations
      • Evaluations overview
      • Generate snapshots
      • Run no code evals
    • 🔎Monitoring
      • Monitoring overview
      • Batch monitoring
      • Collector service
      • Scheduled evaluations
      • Send alerts
    • 📈Dashboard
      • Dashboard overview
      • Pre-built Tabs
      • Panel types
      • Adding Panels
    • 📚Datasets
      • Datasets overview
      • Work with Datasets
    • 🛠️Customization
      • Data drift parameters
      • Embeddings drift parameters
      • Feature importance in data drift
      • Text evals with LLM-as-judge
      • Text evals with HuggingFace
      • Add a custom text descriptor
      • Add a custom drift method
      • Add a custom Metric or Test
      • Customize JSON output
      • Show raw data in Reports
      • Add text comments to Reports
      • Change color schema
    • How-to guides
  • Reference
    • All tests
    • All metrics
      • Ranking metrics
    • Data drift algorithm
    • API Reference
      • evidently.calculations
        • evidently.calculations.stattests
      • evidently.metrics
        • evidently.metrics.classification_performance
        • evidently.metrics.data_drift
        • evidently.metrics.data_integrity
        • evidently.metrics.data_quality
        • evidently.metrics.regression_performance
      • evidently.metric_preset
      • evidently.options
      • evidently.pipeline
      • evidently.renderers
      • evidently.report
      • evidently.suite
      • evidently.test_preset
      • evidently.test_suite
      • evidently.tests
      • evidently.utils
  • Integrations
    • Integrations
      • Evidently integrations
      • Notebook environments
      • Evidently and Airflow
      • Evidently and MLflow
      • Evidently and DVCLive
      • Evidently and Metaflow
  • SUPPORT
    • Migration
    • Contact
    • F.A.Q.
    • Telemetry
    • Changelog
  • GitHub Page
  • Website
Powered by GitBook
On this page
  • Tracing with scheduled evals
  • Batch monitoring jobs
  • Near real-time with collector
  1. User Guide
  2. Monitoring

Monitoring overview

How to set up online evaluations and monitoring.

PreviousMonitoringNextBatch monitoring

Last updated 2 months ago

You are looking at the old Evidently documentation. Check the newer version .

AI quality monitoring automatically evaluates your AI application’s inputs and outputs. This helps you spot and fix issues while keeping an up-to-date view of your system behavior.

New to AI quality evaluations? Start with individual evaluations first. Read more on .

Evidently offers several ways to set up monitoring. Consider the following:

  • Type of AI application. Do you work with simple tabular data or need to capture complex LLM interactions?

  • Batch or real-time. Does your AI system process data in batches, or does it generate predictions live?

  • Evaluation frequency. How often do you need to check the system's performance? Is real-time monitoring necessary, or can you evaluate at intervals like hourly or daily?

  • Where to store prediction logs. Do you want to store raw data (inferences or traces) together with monitoring metrics, or would you prefer to manage them separately?

Considering these factors will help you choose the best monitoring setup. Here are three recommended architectures you can implement with Evidently.

Tracing with scheduled evals

Best for: LLM-powered applications.

Supported in: Evidently Cloud and Evidently Enterprise. Scheduled evaluations are in beta on Evidently Cloud. Contact our team to try it.

How it works:

  • Instrument your app. Use the Tracely library (based on OpenTelemetry) to capture all relevant data from your application, including inputs, outputs, tool calls, and intermediate steps.

  • Store raw data. Evidently Platform stores all raw data, providing a complete record of activity.

  • Schedule evaluations. Set up evaluations to run automatically at scheduled times. This will generate Reports or run Tests directly on the Evidently Platform.

You can also manually run evaluations anytime to assess individual predictions.

Benefits of this approach:

  • Solves the data capture. You collect complex traces and all production data in one place, making it easier to manage and analyze.

  • Easy to re-run evals. With raw traces stored on the platform, you can easily re-run evaluations or add new metrics whenever needed.

  • No-code. Once your trace instrumentation is set up, you can manage everything from the UI.

Batch monitoring jobs

Best for: batch ML pipelines, regression testing, and near real-time ML systems that don’t need instant evaluations.

Supported in: Evidently OSS, Evidently Cloud and Evidently Enterprise.

How it works:

  • Build your evaluation pipeline. Create a pipeline in your infrastructure to run monitoring jobs. This can be a Python script, cron job, or orchestrated with a tool like Airflow. Run it at regular intervals (e.g., hourly, daily) or trigger it when new data or labels arrive.

  • Run metric calculations. Implement the evaluation step in the pipeline using the Evidently Python library. Select the evals, and compute JSON snapshots that will summarize data, metrics, and test results.

  • Store and visualize snapshots. Store and monitor results in Evidently Cloud, or in a designated self-hosted workspace.

Benefits of this approach:

  • Decouples log storage and monitoring metrics. Evidently generates snapshots with data summaries and test results. It does not store raw data or model predictions unless you choose to. This protects data privacy and avoids duplicating logs if they’re already stored elsewhere, like for retraining.

  • Full control over the evaluation pipeline. You decide when evaluations happen. This setup is great for batch ML models, where you can easily add monitoring as another step in your existing pipeline. For online inference, you can log your predictions to a database and set up separate monitoring jobs to query data at intervals.

  • Fits most ML evaluation scenarios. Many evaluations, like data drift detection, naturally work in batches since you need to collect a set of new data points before running them. Model quality checks often only happen when new labeled data arrives, which can be delayed. Analyzing prediction or user behavior shifts is also usually more meaningful when done at intervals rather than recalculating after every single event.

What’s next? Understand the batch workflow in more detail:

Near real-time with collector

Best for: near real-time integration with an ML prediction service.

Supported in: Evidently OSS, Evidently Cloud and Evidently Enterprise.

Near real-time monitoring is for scenarios where you need frequent insights into your AI system's performance. In this setup, data from your ML service is sent directly to a monitoring system, where metrics are calculated on the fly and visualized in an online dashboard.

In Evidently, this works through the Evidently collector service that you deploy on your infrastructure.

How it works:

  • Deploy Evidently collector service. Install the Evidently collector and configure it to run evaluations at set intervals with chosen Evidently Metrics and Tests.

  • Send inferences. Post data (inputs and predictions) from your ML prediction service to the collector. The Evidently Collector manages data batching, computes Reports or Test Suites based on the configuration that you set, and sends them to the Evidently Cloud or designated self-hosted workspace.

The benefits of the approach is that you do not need to write your own evaluation pipelines, and it is suited for frequent calculations.

What’s next? Read how to set up the collector:

What’s next? To get started, and set up .

🔎
here
evaluations
instrument your app with Tracely
scheduled evaluations
Set up tracing
Batch monitoring
Collector service