Options for Statistical Tests
You can modify the statistical tests used to calculate Data and Target Drift.
Available Options
(deprecated)
feature_stattest_func(default:None): define the Statistical Test for features in the DataDrift Dashboard or Profile:None- use default Statistical Tests for all features (based on internal logic)You can define a Statistical Test to be used for all the features in the dataset:
str- the name of StatTest to use across all features (see the available names below)Callable[[pd.Series, pd.Series, str, float], Tuple[float, bool]]- custom StatTest function added (see the requirements for the custom StatTest function below)StatTest- an instance ofStatTest
You can define a Statistical Test to be used for individual features by passing a
dictobject where the key is a feature name and the value is one from the previous options (str,CallableorStatTest)Deprecated: Use
all_features_stattestorper_feature_statttestoptions.
all_features_stattest(default:None): defines a custom statistical test for all features in DataDrift Dashboard or Profile.cat_features_stattest(default:None): defines a custom statistical test for categorical features in DataDrift Dashboard or Profile.num_features_stattest(default:None): defines a custom statistical test for numerical features in DataDrift Dashboard or Profile.per_feature_stattest(default:None): defines a custom statistical test per feature in DataDrift Dashboard or Profile asdictobject where key is feature name and values is statistical test.cat_target_stattest_func(default:None): defines a custom statistical test to detect target drift in the Categorical Target Drift report. It follows the same logic as thefeature_stattest_func, but without thedictoption.num_target_stattest_func(default:None): defines a custom statistical test to detect target drift in the Numerical Target Drift report. It follows the same logic as thefeature_stattest_func, but without thedictoption.
Example:
Change the StatTest for all the features in the Data Drift report:
from evidently.options.data_drift import DataDriftOptions
options = DataDriftOptions(
feature_stattest_func="ks",
) Change the StatTest for a single feature to a Custom (user-defined) function:
from evidently.options.data_drift import DataDriftOptions
def my_stat_test(reference_data, current_data, feature_type, threshold):
return 0.0, False
options = DataDriftOptions(
feature_stattest_func={"feature_1": my_stat_test },
)Change the StatTest for a single feature to Custom function (using a StatTest object):
from evidently.calculations.stattests import StatTest
from evidently.options.data_drift import DataDriftOptions
def _my_stat_test(reference_data, current_data, feature_type, threshold):
return 0.0, False
my_stat_test = StatTest(
name="my_stat_test",
display_name="My Stat Test",
func=_my_stat_test,
allowed_feature_types=["cat"],
)
options = DataDriftOptions(
feature_stattest_func={"feature_1": my_stat_test},
)Custom StatTest function requirements:
The StatTest function should match (reference_data: pd.Series, current_data: pd.Series, threshold: float) -> Tuple[float, bool] signature:
reference_data: pd.Series- reference data seriescurrent_data: pd.Series- current data series to comparefeature_type: str- feature typethreshold: float- Stat Test threshold for drift detection
Returns:
score: float- Stat Test score (actual value)drift_detected: bool- indicates is drift detected with given threshold
Example:
from typing import Tuple
import numpy as np
import pandas as pd
from scipy.stats import anderson_ksamp
def anderson_stat_test(reference_data: pd.Series, current_data: pd.Series, _feature_type: str, threshold: float) -> Tuple[float, bool]:
p_value = anderson_ksamp(np.array([reference_data, current_data]))[2]
return p_value, p_value < thresholdStatTest meta information (StatTest class):
To use the StatTest function, we recommended writing a specific instance of the StatTest class for that function:
To create the instance of the StatTest class, you need:
name: str- a short name used to reference the Stat Test from the options (the StatTest should be registered globally)display_name: str- a long name displayed in the Dashboard and Profilefunc: Callable- a StatTest functionallowed_feature_types: List[str]- the list of allowed feature types to which this function can be applied (available values:cat,num)
Example:
from evidently.calculations.stattests import StatTest
def example_stat_test(reference_data, current_data, feature_type, threshold):
return 0.1, False
example_stat_test = StatTest(
name="example_test",
display_name="Example Stat Test (score)",
func=example_stat_test,
allowed_feature_types=["cat"],
)Available StatTest Functions:
ks- Kolmogorov–Smirnov (K-S) testdefault for numerical features
only for numerical features
returns
p_valuedrift detected when
p_value < threshold
chisquare- Chi-Square testdefault for categorical features if the number of labels for feature > 2
only for categorical features
returns
p_valuedrift detected when
p_value < threshold
z- Z-testdefault for categorical features if the number of labels for feature <= 2
only for categorical features
returns
p_valuedrift detected when
p_value < threshold
wasserstein- Wasserstein distance (normed)only for numerical features
returns
distancedrift detected when
distance >= threshold
kl_div- Kullback-Leibler divergencefor numerical and categorical features
returns
divergencedrift detected when
divergence >= threshold
psi- Population Stability Index (PSI)for numerical and categorical features
returns
psi_valuedrift detected when
psi_value >= threshold
jensenshannon- Jensen-Shannon distancefor numerical and categorical features
returns
distancedrift detected when
distance >= threshold
Last updated