evidently.calculations.stattests
Available statistical tests. For detailed information about statistical tests see module documentation.
Submodules
anderson_darling_stattest module
Anderson-Darling test of two samples.
Name: “anderson”
Import:
>>> from evidently.calculations.stattests import anderson_darling_test
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import anderson_darling_test
>>> options = DataDriftOptions(all_features_stattest=anderson_darling_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="anderson")
chisquare_stattest module
Chisquare test of two samples.
Name: “chisquare”
Import:
>>> from evidently.calculations.stattests import chi_stat_test
Properties:
only for categorical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import chi_stat_test
>>> options = DataDriftOptions(all_features_stattest=chi_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="chisquare")
cramer_von_mises_stattest module
Cramer-Von-mises test of two samples.
Name: “cramer_von_mises”
Import:
>>> from evidently.calculations.stattests import cramer_von_mises
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import cramer_von_mises
>>> options = DataDriftOptions(all_features_stattest=cramer_von_mises)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="cramer_von_mises")
class CramerVonMisesResult(statistic, pvalue)
Bases: object
energy_distance module
Energy-distance test of two samples.
Name: “ed”
Import:
>>> from evidently.calculations.stattests import energy_dist_test
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import energy_dist_test
>>> options = DataDriftOptions(all_features_stattest=energy_dist_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="ed")
epps_singleton_stattest module
Epps-Singleton test of two samples.
Name: “es”
Import:
>>> from evidently.calculations.stattests import epps_singleton_test
Properties:
only for numerical features
returns p-value
default threshold 0.05
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import epps_singleton_test
>>> options = DataDriftOptions(all_features_stattest=epps_singleton_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="es")
fisher_exact_stattest module
Fisher’s exact test of two samples.
Name: “fisher_exact”
Import:
>>> from evidently.calculations.stattests import fisher_exact_test
Properties:
only for categorical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import fisher_exact_test
>>> options = DataDriftOptions(all_features_stattest=fisher_exact_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="fisher_exact")
g_stattest module
G-test of two samples.
Name: “g_test”
Import:
>>> from evidently.calculations.stattests import g_test
Properties:
only for categorical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import g_test
>>> options = DataDriftOptions(all_features_stattest=g_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="g_test")
hellinger_distance module
Hellinger distance of two samples.
Name: “hellinger”
Import:
>>> from evidently.calculations.stattests import hellinger_stat_test
Properties:
only for categorical and numerical features
returns distance
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import hellinger_stat_test
>>> options = DataDriftOptions(all_features_stattest=hellinger_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="hellinger")
jensenshannon module
Jensen-Shannon distance of two samples.
Name: “jensenshannon”
Import:
>>> from evidently.calculations.stattests import jensenshannon_stat_test
Properties:
only for categorical and numerical features
returns distance
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import jensenshannon_stat_test
>>> options = DataDriftOptions(all_features_stattest=jensenshannon_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="jensenshannon")
kl_div module
Kullback-Leibler divergence of two samples.
Name: “kl_div”
Import:
>>> from evidently.calculations.stattests import kl_div_stat_test
Properties:
only for categorical and numerical features
returns divergence
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import kl_div_stat_test
>>> options = DataDriftOptions(all_features_stattest=kl_div_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="kl_div")
ks_stattest module
Kolmogorov-Smirnov test of two samples.
Name: “ks”
Import:
>>> from evidently.calculations.stattests import ks_stat_test
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import ks_stat_test
>>> options = DataDriftOptions(all_features_stattest=ks_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="ks")
mann_whitney_urank_stattest module
Mann-Whitney U-rank test of two samples.
Name: “mannw”
Import:
>>> from evidently.calculations.stattests import mann_whitney_u_stat_test
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import mann_whitney_u_stat_test
>>> options = DataDriftOptions(all_features_stattest=mann_whitney_u_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="mannw")
psi module
PSI of two samples.
Name: “psi”
Import:
>>> from evidently.calculations.stattests import psi_stat_test
Properties:
only for categorical and numerical features
returns PSI value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import psi_stat_test
>>> options = DataDriftOptions(all_features_stattest=psi_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="psi")
registry module
class StatTest(name: str, display_name: str, func: Callable[[pandas.core.series.Series, pandas.core.series.Series, str, float], Tuple[float, bool]], allowed_feature_types: List[str], default_threshold: float = 0.05)
Bases: object
Attributes:
allowed_feature_types : List[str]
default_threshold : float = 0.05
display_name : str
func : Callable[[Series, Series, str, float], Tuple[float, bool]]
name : str
exception StatTestInvalidFeatureTypeError(stattest_name: str, feature_type: str)
Bases: ValueError
exception StatTestNotFoundError(stattest_name: str)
Bases: ValueError
class StatTestResult(drift_score: float, drifted: bool, actual_threshold: float)
Bases: object
Attributes:
actual_threshold : float
drift_score : float
drifted : bool
get_stattest(reference_data: Series, current_data: Series, feature_type: str, stattest_func: Optional[Union[str, Callable[[Series, Series, str, float], Tuple[float, bool]], StatTest]])
register_stattest(stat_test: StatTest)
t_test module
T test of two samples.
Name: “t_test”
Import:
>>> from evidently.calculations.stattests import t_test
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import t_test
>>> options = DataDriftOptions(all_features_stattest=t_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="t_test")
tvd_stattest module
Total variation distance of two samples.
Name: “TVD”
Import:
>>> from evidently.calculations.stattests import tvd_test
Properties:
only for numerical features
returns distance
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import tvd_test
>>> options = DataDriftOptions(all_features_stattest=tvd_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="TVD")
utils module
generate_fisher2x2_contingency_table(reference_data: Series, current_data: Series)
Generate 2x2 contingency matrix for fisher exact test :param reference_data: reference data :param current_data: current data
Raises
ValueError
– if reference_data and current_data are not of equal lengthReturns
contingency_matrix for binary data
Return type
contingency_matrix
get_binned_data(reference_data: Series, current_data: Series, feature_type: str, n: int, feel_zeroes: bool = True)
Split variable into n buckets based on reference quantiles :param reference_data: reference data :param current_data: current data :param feature_type: feature type :param n: number of quantiles
Returns
% of records in each bucket for reference current_percents: % of records in each bucket for current
Return type
reference_percents
get_unique_not_nan_values_list_from_series(current_data: Series, reference_data: Series)
Get unique values from current and reference series, drop NaNs
permutation_test(reference_data, current_data, observed, test_statistic_func, iterations=100)
Perform a two-sided permutation test :param reference_data: reference data :param current_data: current data :param observed: observed value :param test_statistic_func: the test statistic function :param iterations: number of times to permute
Returns
two-sided p_value
Return type
p_value
wasserstein_distance_norm module
Wasserstein distance of two samples.
Name: “wasserstein”
Import:
>>> from evidently.calculations.stattests import wasserstein_stat_test
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import wasserstein_stat_test
>>> options = DataDriftOptions(all_features_stattest=wasserstein_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="wasserstein")
z_stattest module
Mann-Whitney U-rank test of two samples.
Name: “mannw”
Import:
>>> from evidently.calculations.stattests import mann_whitney_u_stat_test
Properties:
only for numerical features
returns p-value
Example
Using by object:
>>> from evidently.options import DataDriftOptions
>>> from evidently.calculations.stattests import mann_whitney_u_stat_test
>>> options = DataDriftOptions(all_features_stattest=mann_whitney_u_stat_test)
Using by name:
>>> from evidently.options import DataDriftOptions
>>> options = DataDriftOptions(all_features_stattest="mannw")
proportions_diff_z_stat_ind(ref: DataFrame, curr: DataFrame)
proportions_diff_z_test(z_stat, alternative='two-sided')
Last updated