Feature importance in data drift

How to show feature importance in Data Drift evaluations.

You are looking at the old Evidently documentation: this API is available with versions 0.6.7 or lower. Check the newer docs version here.

You can add feature importances to the dataset-level data drift Tests and Metrics:

DataDriftTable
TestShareOfDriftedColumns

Code example

Notebook example on showing feature importance:

evidently/examples/how_to_questions/how_to_add_feature_importances_to_drift.ipynb at ad71e132d59ac3a84fce6cf27bd50b12b10d9137 · evidentlyai/evidentlyGitHub

Compute feature importances

By default, the feature importance column is not shown. To display them, you must set the feature_importance parameter as True.

report = Report(metrics = [
    DataDriftTable(feature_importance=True)
])

If you do not specify anything else, Evidently will train a random forest model using the provided dataset and derive the feature importances.

Notes:

This is only possible if your dataset contains the target column.
If you have both current and reference datasets, two different models will be trained. You will have two columns with feature importance: one for reference and one for current data.
If your dataset also contains the prediction column, you should clearly label it using Column Mapping to avoid it being treated as a feature.

Pass your own importances

You can also pass the list of feature importances derived during the model training process. This is a recommended option.

In this case, pass it as a list using the additional_data parameter when running the Report.

report = Report(metrics = [
    DataDriftTable(feature_importance=True)
])
report.run(reference_data=reference,
           current_data=current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'],
           column_mapping=column_mapping,
           additional_data = {'current_feature_importance':
              dict(map(lambda i,j : (i,j), numerical_features + categorical_features, regressor.feature_importances_))
            }
           )

You can pass the current_feature_importance – a single column will appear in this case. You can also optionally pass reference_feature_importance.

PreviousEmbeddings drift parameters NextText evals with LLM-as-judge

Last updated 7 months ago