Run Evidently on Spark
How to run calculations on Spark.
You can run distributed computation using Spark if you work with large datasets.
Supported metrics
Currently, the following Tests, Metrics and Presets are supported:
ColumnDriftMetric()DataDriftTable()DatasetDriftMetric()DataDriftPreset()TestColumnDrift()TestShareOfDriftedColumns()TestNumberOfDriftedColumns()DataDriftTestPreset()
For drift calculation, the following methods are supported:
chisquarejensen shannonpsiwasserstein
The following data types are supported:
numerical_featurescategorical_features
Code example
You can refer to an example How-to-notebook showing how to use Evidently on Spark:
Run Evidently with Spark
To run Evidently on a Spark DataFrame, you need to specify the corresponding engine in the run() method for the Report calculation:
To import SparkEngine from Evidently, use the following command:
from evidently.spark.engine import SparkEnginePass the SparkEngine to the run method when you create the Report:
spark_report_table = Report(metrics=[
    DataDriftTable()
])
spark_report_table.run(reference_data=reference, current_data=current, engine=SparkEngine)
spark_report_table.show()  # OR spark_report_table.show()Last updated