evidently.calculations
Last updated
Last updated
Bases: object
labels : Sequence[Union[str, int]]
values : list
Bases: object
accuracy : float
f1 : float
fnr : Optional[float] = None
fpr : Optional[float] = None
log_loss : Optional[float] = None
plot_data : Optional[Dict] = None
precision : float
rate_plots_data : Optional[Dict] = None
recall : float
roc_auc : Optional[float] = None
tnr : Optional[float] = None
tpr : Optional[float] = None
Bases: object
labels : List[Union[str, int]]
prediction_probas : Optional[DataFrame]
predictions : Series
Calculate metrics:
TP (true positive)
TN (true negative)
FP (false positive)
FN (false negative) for each class from confusion matrix.
Returns
a dict like:
Get predicted values and optional prediction probabilities from source data. Also take into account a threshold value - if a probability is less than the value, do not take it into account.
Return and object with predicted values and an optional prediction probabilities.
Get prediction values by probabilities with the threshold apply
Methods and types for data drift calculations.
Bases: object
One column drift metrics.
column_name : str
column_type : str
current_correlations : Optional[Dict[str, float]] = None
current_scatter : Optional[Dict[str, list]] = None
current_small_distribution : Optional[list] = None
drift_detected : bool
drift_score : float
plot_shape : Optional[Dict[str, float]] = None
reference_correlations : Optional[Dict[str, float]] = None
reference_small_distribution : Optional[list] = None
stattest_name : str
threshold : float
x_name : Optional[str] = None
Bases: object
Dataset drift calculation results
dataset_drift : bool
dataset_drift_score : float
number_of_drifted_columns : int
Bases: object
dataset_drift : bool
drift_by_columns : Dict[str, ColumnDataDriftMetrics]
number_of_columns : int
number_of_drifted_columns : int
share_of_drifted_columns : float
Update dataset by predictions type:
if prediction column is None or a string, no dataset changes
(binary classification) if predictions is a list and its length equals 2
set predicted_labels column by threshold
(multi label classification) if predictions is a list and its length is greater than 2
set predicted_labels from probability values in columns by prediction column
Returns
prediction column name.
Calculate the number of missed - nulls by pandas - values in a dataset
Calculate the number of almost constant columns in a dataset
Calculate the number of almost duplicated columns in a dataset
Calculate the number of constant columns in a dataset
Calculate the number of duplicated columns in a dataset
Calculate the number of empty columns in a dataset
Methods for overall dataset quality calculations - rows count, a specific values count, etc.
Bases: object
column_name : str
kind : str
Bases: object
calculate_data_by_target(curr: DataFrame, ref: Optional[DataFrame], feature_name: str, feature_type: str, target_name: str, target_type: str, merge_small_cat: Optional[int] = 5)
calculate_data_in_time(curr: DataFrame, ref: Optional[DataFrame], feature_name: str, feature_type: str, datetime_name: str, merge_small_cat: Optional[int] = 5)
calculate_main_plot(curr: DataFrame, ref: Optional[DataFrame], feature_name: str, feature_type: str, merge_small_cat: Optional[int] = 5)
Bases: object
bins_for_hist : Dict[str, DataFrame]
Bases: object
cat_features_stats : Optional[Dict[str, FeatureQualityStats]] = None
datetime_features_stats : Optional[Dict[str, FeatureQualityStats]] = None
num_features_stats : Optional[Dict[str, FeatureQualityStats]] = None
prediction_stats : Optional[Dict[str, FeatureQualityStats]] = None
rows_count : int
target_stats : Optional[Dict[str, FeatureQualityStats]] = None
get_all_features()
Bases: object
Class for all features data quality metrics store.
A type of the feature is stored in feature_type field. Concrete stat kit depends on the feature type. Is a metric is not applicable - leave None value for it.
Metrics for all feature types:
Metrics for numeric features only:
Metrics for category features only:
new_in_current_values_count - quantity of new values in the current dataset after the reference
new_in_current_values_count - quantity of values in the reference dataset that not presented in the current
Defined for reference dataset only.
count : int = 0
feature_type : str
infinite_count : Optional[int] = None
infinite_percentage : Optional[float] = None
max : Optional[Union[int, float, bool, str]] = None
mean : Optional[float] = None
min : Optional[Union[int, float, bool, str]] = None
missing_count : Optional[int] = None
missing_percentage : Optional[float] = None
most_common_not_null_value : Optional[Union[int, float, bool, str]] = None
most_common_not_null_value_percentage : Optional[float] = None
most_common_value : Optional[Union[int, float, bool, str]] = None
most_common_value_percentage : Optional[float] = None
new_in_current_values_count : Optional[int] = None
number_of_rows : int = 0
percentile_25 : Optional[float] = None
percentile_50 : Optional[float] = None
percentile_75 : Optional[float] = None
std : Optional[float] = None
unique_count : Optional[int] = None
unique_percentage : Optional[float] = None
unused_in_current_values_count : Optional[int] = None
as_dict()
is_category()
Checks that the object store stats for a category feature
is_datetime()
Checks that the object store stats for a datetime feature
is_numeric()
Checks that the object store stats for a numeric feature
For category columns calculate cramer_v correlation
Compute pairwise correlation of columns :param df: initial data frame. :param func: function for computing pairwise correlation.
Returns
Correlation matrix.
Count quantity of rows in a dataset
Bases: object
Bases: object
feature_type : str
majority : float
over : float
range : float
under : float
as_dict(prefix)
Bases: object
abs_error_max : float
abs_error_std : float
abs_perc_error_std : float
error_bias : dict
error_normality : dict
error_std : float
mean_abs_error : float
mean_abs_perc_error : float
mean_error : float
underperformance : dict
current_distribution :
reference_distribution :
dataset_columns :
options :
values :