Run Evidently on Spark
How to run calculations on Spark.
Last updated
How to run calculations on Spark.
Last updated
You can run distributed computation using Spark if you work with large datasets.
Currently, the following Tests, Metrics and Presets are supported:
ColumnDriftMetric()
DataDriftTable()
DatasetDriftMetric()
DataDriftPreset()
TestColumnDrift()
TestShareOfDriftedColumns()
TestNumberOfDriftedColumns()
DataDriftTestPreset()
For drift calculation, the following methods are supported:
chisquare
jensen shannon
psi
wasserstein
The following data types are supported:
numerical_features
categorical_features
You can refer to an example How-to-notebook showing how to use Evidently on Spark:
To run Evidently on a Spark DataFrame, you need to specify the corresponding engine in the run()
method for the Report calculation:
To import SparkEngine
from Evidently, use the following command:
Pass the SparkEngine
to the run
method when you create the Report: