LogoLogo
HomeBlogGitHub
latest
latest
  • New DOCS
  • What is Evidently?
  • Get Started
    • Evidently Cloud
      • Quickstart - LLM tracing
      • Quickstart - LLM evaluations
      • Quickstart - Data and ML checks
      • Quickstart - No-code evaluations
    • Evidently OSS
      • OSS Quickstart - LLM evals
      • OSS Quickstart - Data and ML monitoring
  • Presets
    • All Presets
    • Data Drift
    • Data Quality
    • Target Drift
    • Regression Performance
    • Classification Performance
    • NoTargetPerformance
    • Text Evals
    • Recommender System
  • Tutorials and Examples
    • All Tutorials
    • Tutorial - Tracing
    • Tutorial - Reports and Tests
    • Tutorial - Data & ML Monitoring
    • Tutorial - LLM Evaluation
    • Self-host ML Monitoring
    • LLM as a judge
    • LLM Regression Testing
  • Setup
    • Installation
    • Evidently Cloud
    • Self-hosting
  • User Guide
    • 📂Projects
      • Projects overview
      • Manage Projects
    • 📶Tracing
      • Tracing overview
      • Set up tracing
    • 🔢Input data
      • Input data overview
      • Column mapping
      • Data for Classification
      • Data for Recommendations
      • Load data to pandas
    • 🚦Tests and Reports
      • Reports and Tests Overview
      • Get a Report
      • Run a Test Suite
      • Evaluate Text Data
      • Output formats
      • Generate multiple Tests or Metrics
      • Run Evidently on Spark
    • 📊Evaluations
      • Evaluations overview
      • Generate snapshots
      • Run no code evals
    • 🔎Monitoring
      • Monitoring overview
      • Batch monitoring
      • Collector service
      • Scheduled evaluations
      • Send alerts
    • 📈Dashboard
      • Dashboard overview
      • Pre-built Tabs
      • Panel types
      • Adding Panels
    • 📚Datasets
      • Datasets overview
      • Work with Datasets
    • 🛠️Customization
      • Data drift parameters
      • Embeddings drift parameters
      • Feature importance in data drift
      • Text evals with LLM-as-judge
      • Text evals with HuggingFace
      • Add a custom text descriptor
      • Add a custom drift method
      • Add a custom Metric or Test
      • Customize JSON output
      • Show raw data in Reports
      • Add text comments to Reports
      • Change color schema
    • How-to guides
  • Reference
    • All tests
    • All metrics
      • Ranking metrics
    • Data drift algorithm
    • API Reference
      • evidently.calculations
        • evidently.calculations.stattests
      • evidently.metrics
        • evidently.metrics.classification_performance
        • evidently.metrics.data_drift
        • evidently.metrics.data_integrity
        • evidently.metrics.data_quality
        • evidently.metrics.regression_performance
      • evidently.metric_preset
      • evidently.options
      • evidently.pipeline
      • evidently.renderers
      • evidently.report
      • evidently.suite
      • evidently.test_preset
      • evidently.test_suite
      • evidently.tests
      • evidently.utils
  • Integrations
    • Integrations
      • Evidently integrations
      • Notebook environments
      • Evidently and Airflow
      • Evidently and MLflow
      • Evidently and DVCLive
      • Evidently and Metaflow
  • SUPPORT
    • Migration
    • Contact
    • F.A.Q.
    • Telemetry
    • Changelog
  • GitHub Page
  • Website
Powered by GitBook
On this page
  • Use Case
  • Data Drift Report
  • Code example
  • How it works
  • Data Requirements
  • How it looks
  • Metrics output
  • Report customization
  • Data Drift Test Suite
  • Code example
  • How it works
  • Test Suite customization
  • Examples
  1. Presets

Data Drift

PreviousAll PresetsNextData Quality

Last updated 2 months ago

You are looking at the old Evidently documentation: this API is available with versions 0.6.7 or lower. Check the newer version .

TL;DR: You can detect and analyze changes in the input feature distributions.

  • Report: for visual analysis or metrics export, use the DataDriftPreset.

  • Test Suite: for pipeline checks, use the DataDriftTestPreset.

Use Case

You can evaluate data drift in different scenarios.

  1. To monitor the model performance without ground truth. When you do not have true labels or actuals, you can monitor the feature drift to check if the model operates in a familiar environment. You can combine it with the . If you detect drift, you can trigger labeling and retraining, or decide to pause and switch to a different decision method.

  2. When you are debugging the model quality decay. If you observe a drop in the model quality, you can evaluate Data Drift to explore the change in the feature patterns, e.g., to understand the change in the environment or discover the appearance of a new segment.

  3. To understand model drift in an offline environment. You can explore the historical data drift to understand past changes in the input data and define the optimal drift detection approach and retraining strategy.

  4. To decide on the model retraining. Before feeding fresh data into the model, you might want to verify whether it even makes sense. If there is no data drift, the environment is stable, and retraining might not be necessary.

To run drift checks as part of the pipeline, use the Test Suite. To explore and debug, use the Report.

Data Drift Report

If you want to get a visual report, you can create a new Report object and use the DataDriftPreset.

Code example

data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(reference_data=ref, current_data=cur)
data_drift_report

How it works

The Data Drift report helps detect and explore changes in the input data.

  • Applies as suitable drift detection method for numerical, categorical or text features.

  • Plots feature values and distributions for the two datasets.

Data Requirements

  • You will need two datasets. The reference dataset serves as a benchmark. Evidently analyzes the change by comparing the current production data to the reference data to detect distribution drift.

  • Input features. The dataset should include the features you want to evaluate for drift. The schema of both datasets should be identical. If your dataset contains target or prediction column, they will also be analyzed for drift.

How it looks

The default report includes 4 components. All plots are interactive.

1. Data Drift Summary

The report returns the share of drifting features and an aggregate Dataset Drift result.

Dataset Drift sets a rule on top of the results of the statistical tests for individual features. By default, Dataset Drift is detected if at least 50% of features drift.

2. Data Drift Table

The table shows the drifting features first. You can also choose to sort the rows by the feature name or type.

3. Data Distribution by Feature

By clicking on each feature, you can explore the distributions or top characteristic words (for text features).

4. Data Drift by Feature

For numerical features, you can also explore the values mapped in a plot.

  • The dark green line is the mean, as seen in the reference dataset.

  • The green area covers one standard deviation from the mean.

Metrics output

You can get the report output as a JSON or a Python dictionary:

See JSON example
{
  "data_drift": {
    "name": "data_drift",
    "datetime": "datetime",
    "data": {
      "utility_columns": {
        "date": null,
        "id": null,
        "target": null,
        "prediction": null,
        "drift_conf_level": value,
        "drift_features_share": value,
        "nbinsx": {
          "feature_name": value,
          "feature_name": value
        },
        "xbins": null
      },
      },
      "cat_feature_names": [],
      "num_feature_names": [],
      "metrics": {
        "feature_name" :{
          "prod_small_hist": [
            [],
            []
          ],
          "ref_small_hist": [
            [],
            []
          ],
          "feature_type": "num",
          "p_value": p_value
      },
      "n_features": value,
      "n_drifted_features": value,
      "share_drifted_features": value,
      "dataset_drift": false
    }
  },
  "timestamp": "timestamp"
}

Report customization

  • You can create a different report from scratch taking this one as an inspiration.

  • You can apply the report only to selected columns, for example, the most important features.

Data Drift Test Suite

If you want to run data drift checks as part of the pipeline, you can create a Test Suite and use the DataDriftTestPreset.

Code example

data_drift_test_suite = TestSuite(tests=[
   DataDriftTestPreset(),
])
 
data_drift_test_suite.run(reference_data=ref, current_data=curr)
data_drift_test_suite

How it works

You can use the DataDriftTestPreset to test features for drift when you receive a new batch of input data or generate a new set of predictions.

The test preset works similarly to the metric preset. It will perform two types of tests:

  • test the share of drifted columns to detect dataset drift;

  • test distribution drift in the individual columns (all or from a defined list).

Test Suite customization

  • You can apply the preset only to selected columns.

  • You can create a different test suite from scratch taking this one as an inspiration.

Examples

Column mapping. Evidently can evaluate drift both for numerical, categorical and text features. You can explicitly specify the type of each column using . If it is not specified, Evidently will try to identify the numerical and categorical features automatically. It is recommended to use column mapping to avoid errors. If you have text data, you must always specify it.

Aggregated visuals in plots. Starting from v 0.3.2, all visuals in the Evidently Reports are aggregated by default. This helps decrease the load time and report size for larger datasets. If you work with smaller datasets or samples, you can pass an . You can choose whether you want it on not based on the size of your dataset.

Evidently uses the default to select the drift detection method based on feature type and the number of observations in the reference dataset.

You can modify the drift detection logic by selecting a different method, including PSI, K–L divergence, Jensen-Shannon distance, Wasserstein distance, setting a different threshold and condition for the dataset drift. See more details about . You can also implement a .

To build up a better intuition for which tests are better in different kinds of use cases, visit our blog to read to the tradeoffs when choosing the statistical test for data drift.

Note: by default, the visualization is aggregated. In this case, the index is binned into 150 bins, and the y-axis shows the mean value. You can enable the to see the individual data points.

You can .

You can add a .

You can use a .

Head here to the table to see the description of individual tests and parameters.

You can .

You can add a .

If you want to compare descriptive statistics between the two datasets, you can also use the .

Browse the for sample Jupyter notebooks and Colabs.

You can also explore about drift detection, including or .

here
Prediction Drift
column mapping object
option to generate plots with raw data
data drift detection algorithm
setting data drift parameters
custom drift detection method
an in-depth guide
raw data option
specify the drift detection methods and thresholds
custom drift detection method
different color schema for the report
All tests
specify the drift detection methods and thresholds
custom drift detection method
Data Stability test preset
examples
blog posts
How to handle drift
how to analyze historical drift patterns