Feature importance in data drift
How to show feature importance in Data Drift evaluations.
You can add feature importances to the dataset-level data drift Tests and Metrics:
DataDriftTableTestShareOfDriftedColumns
Code example
Notebook example on showing feature importance:
Compute feature importances
By default, the feature importance column is not shown. To display them, you must set the feature_importance parameter as True.
report = Report(metrics = [
DataDriftTable(feature_importance=True)
])If you do not specify anything else, Evidently will train a random forest model using the provided dataset and derive the feature importances.
Notes:
This is only possible if your dataset contains the
targetcolumn.If you have both
currentandreferencedatasets, two different models will be trained. You will have two columns with feature importance: one forreferenceand one forcurrentdata.If your dataset also contains the
predictioncolumn, you should clearly label it using Column Mapping to avoid it being treated as a feature.
Pass your own importances
You can also pass the list of feature importances derived during the model training process. This is a recommended option.
In this case, pass it as a list using the additional_data parameter when running the Report.
report = Report(metrics = [
DataDriftTable(feature_importance=True)
])
report.run(reference_data=reference,
current_data=current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'],
column_mapping=column_mapping,
additional_data = {'current_feature_importance':
dict(map(lambda i,j : (i,j), numerical_features + categorical_features, regressor.feature_importances_))
}
)You can pass the current_feature_importance – a single column will appear in this case. You can also optionally pass reference_feature_importance.
Last updated