Run Evidently on Spark

How to run calculations on Spark.

circle-info

You are looking at the old Evidently documentation: this API is available with versions 0.6.7 or lower. Check the newer docs version herearrow-up-right.

You can run distributed computation using Spark if you work with large datasets.

Supported metrics

Currently, the following Tests, Metrics and Presets are supported:

  • ColumnDriftMetric()

  • DataDriftTable()

  • DatasetDriftMetric()

  • DataDriftPreset()

  • TestColumnDrift()

  • TestShareOfDriftedColumns()

  • TestNumberOfDriftedColumns()

  • DataDriftTestPreset()

For drift calculation, the following methods are supported:

  • chisquare

  • jensen shannon

  • psi

  • wasserstein

The following data types are supported:

  • numerical_features

  • categorical_features

Code example

You can refer to an example How-to-notebook showing how to use Evidently on Spark:

Run Evidently with Spark

To run Evidently on a Spark DataFrame, you need to specify the corresponding engine in the run() method for the Report calculation:

To import SparkEngine from Evidently, use the following command:

Pass the SparkEngine to the run method when you create the Report:

Last updated