Options for Statistical Tests

You can modify the statistical tests used to calculate Data and Target Drift.

Available Options

  • (deprecated)feature_stattest_func (default: None): define the Statistical Test for features in the DataDrift Dashboard or Profile:

    • None - use default Statistical Tests for all features (based on internal logic)

    • You can define a Statistical Test to be used for all the features in the dataset:

      • str - the name of StatTest to use across all features (see the available names below)

      • Callable[[pd.Series, pd.Series, str, float], Tuple[float, bool]] - custom StatTest function added (see the requirements for the custom StatTest function below)

      • StatTest - an instance of StatTest

    • You can define a Statistical Test to be used for individual features by passing a dict object where the key is a feature name and the value is one from the previous options (str, Callable or StatTest)

    • Deprecated: Use all_features_stattest or per_feature_statttest options.

  • all_features_stattest(default: None): defines a custom statistical test for all features in DataDrift Dashboard or Profile.

  • cat_features_stattest (default: None): defines a custom statistical test for categorical features in DataDrift Dashboard or Profile.

  • num_features_stattest (default: None): defines a custom statistical test for numerical features in DataDrift Dashboard or Profile.

  • per_feature_stattest (default: None): defines a custom statistical test per feature in DataDrift Dashboard or Profile as dict object where key is feature name and values is statistical test.

  • cat_target_stattest_func (default: None): defines a custom statistical test to detect target drift in the Categorical Target Drift report. It follows the same logic as the feature_stattest_func, but without the dict option.

  • num_target_stattest_func (default: None): defines a custom statistical test to detect target drift in the Numerical Target Drift report. It follows the same logic as the feature_stattest_func, but without the dict option.

Example:

Change the StatTest for all the features in the Data Drift report:

Change the StatTest for a single feature to a Custom (user-defined) function:

Change the StatTest for a single feature to Custom function (using a StatTest object):

Custom StatTest function requirements:

The StatTest function should match (reference_data: pd.Series, current_data: pd.Series, threshold: float) -> Tuple[float, bool] signature:

  • reference_data: pd.Series - reference data series

  • current_data: pd.Series - current data series to compare

  • feature_type: str - feature type

  • threshold: float - Stat Test threshold for drift detection

Returns:

  • score: float - Stat Test score (actual value)

  • drift_detected: bool - indicates is drift detected with given threshold

Example:

StatTest meta information (StatTest class):

To use the StatTest function, we recommended writing a specific instance of the StatTest class for that function:

To create the instance of the StatTest class, you need:

  • name: str - a short name used to reference the Stat Test from the options (the StatTest should be registered globally)

  • display_name: str - a long name displayed in the Dashboard and Profile

  • func: Callable - a StatTest function

  • allowed_feature_types: List[str] - the list of allowed feature types to which this function can be applied (available values: cat, num)

Example:

Available StatTest Functions:

  • ks - Kolmogorov–Smirnov (K-S) test

    • default for numerical features

    • only for numerical features

    • returns p_value

    • drift detected when p_value < threshold

  • chisquare - Chi-Square test

    • default for categorical features if the number of labels for feature > 2

    • only for categorical features

    • returns p_value

    • drift detected when p_value < threshold

  • z - Z-test

    • default for categorical features if the number of labels for feature <= 2

    • only for categorical features

    • returns p_value

    • drift detected when p_value < threshold

  • wasserstein - Wasserstein distance (normed)

    • only for numerical features

    • returns distance

    • drift detected when distance >= threshold

  • kl_div - Kullback-Leibler divergence

    • for numerical and categorical features

    • returns divergence

    • drift detected when divergence >= threshold

  • psi - Population Stability Index (PSI)

    • for numerical and categorical features

    • returns psi_value

    • drift detected when psi_value >= threshold

  • jensenshannon - Jensen-Shannon distance

    • for numerical and categorical features

    • returns distance

    • drift detected when distance >= threshold

Last updated