Feature importance in data drift

How to show feature importance in Data Drift evaluations.

circle-info

You are looking at the old Evidently documentation: this API is available with versions 0.6.7 or lower. Check the newer docs version herearrow-up-right.

You can add feature importances to the dataset-level data drift Tests and Metrics:

  • DataDriftTable

  • TestShareOfDriftedColumns

Code example

Notebook example on showing feature importance:

Compute feature importances

By default, the feature importance column is not shown. To display them, you must set the feature_importance parameter as True.

If you do not specify anything else, Evidently will train a random forest model using the provided dataset and derive the feature importances.

Notes:

  • This is only possible if your dataset contains the target column.

  • If you have both current and reference datasets, two different models will be trained. You will have two columns with feature importance: one for reference and one for current data.

  • If your dataset also contains the prediction column, you should clearly label it using Column Mapping to avoid it being treated as a feature.

Pass your own importances

You can also pass the list of feature importances derived during the model training process. This is a recommended option.

In this case, pass it as a list using the additional_data parameter when running the Report.

You can pass the current_feature_importance – a single column will appear in this case. You can also optionally pass reference_feature_importance.

Last updated