Column mapping
How to use column mapping in Evidently.
Column mapping helps map your input data schema or specify column types. For example, to run evaluation on text data, you must specify which columns in your dataset contain texts. This allows Evidently to process the input data correctly.
You can create a ColumnMapping
object in Python prior to generating a Report or Test Suite or map the columns visually when working in the Evidently platform.
You only need to map columns that will be used in your evaluations.
Default mapping strategy
If the column_mapping
is not specified or set as None
, Evidently will use the default mapping strategy, trying to match the columns automatically.
Column types:
All columns with numeric types (np.number) will be treated as Numerical.
All columns with DateTime format (np.datetime64) will be treated as DateTime.
All other columns will be treated as Categorical.
Dataset structure:
The column named "id" will be treated as an ID column.
The column named "datetime" will be treated as a DateTime column.
The column named "target" will be treated as a target column with a true label or value.
The column named "prediction" will be treated as a model prediction.
Which columns do you need?
To run certain types of evaluations, you must include specific columns or provide a reference dataset. For example, to run text evaluations, you must have at least one column labeled as text. To run data drift checks, you always need a reference dataset.
Here are example requirements:
Text Evals
Required (Text)
Optional
Optional
Optional
Optional
Optional
Data Quality
Required (Any)
Optional
Optional
Optional
Optional
Optional
Data Drift
Required (Any)
Optional
Optional
Optional
Optional
Required
Target Drift
Optional
Target and/or prediction required
Target and/or prediction required
Optional
Optional
Required
Classification
Optional
Required
Required
Optional
Optional
Optional
Regression
Optional
Required
Required
Optional
Optional
Optional
Code example
Notebook example on specifying column mapping:
Imports. Imports to use column mapping:
from evidently import ColumnMapping
Basic API. Once you create a ColumnMapping
object, you pass it along with the data when computing the Report or Test Suite. For example:
column_mapping = ColumnMapping()
column_mapping.target = 'target'
column_mapping.prediction = 'prediction'
column_mapping.numerical_features = numerical_features
column_mapping.categorical_features = categorical_features
report = Report(metrics=[
RegressionPreset(),
])
report.run(reference_data=ref,
current_data=cur,
column_mapping=column_mapping)
report
Column mapping
DateTime and ID
To map columns containing DateTime and ID:
column_mapping.datetime = 'date' #'date' is the name of the column with datetime
column_mapping.id = None #there is no ID column in the dataset
Target and Prediction
To map columns containing Target and Prediction:
column_mapping.target = 'y'
column_mapping.prediction = 'pred'
This matches regression or simple classification tasks. For more complex cases, check detailed instructions on how to map inputs for classification and ranking and recommendations.
Categorical and numerical columns
To split the columns into numerical and categorical types, pass them as lists:
column_mapping.numerical_features = ['temp', 'atemp', 'humidity']
column_mapping.categorical_features = ['season', 'holiday']
Text data
To specify columns that contain text data:
column_mapping.text_features = ['email_subject', 'email_body']
Embeddings features
To specify which columns in your dataset contain embeddings, pass a dictionary where keys are embedding names and values are lists of columns.
Here is an example of how you point to the defined list of columns that contain embeddings:
column_mapping = ColumnMapping()
column_mapping.embeddings = {'small_subset': embeddings_data.columns[:10]}
DateTime features
You might have temporal features in your dataset. For example, “date of the last contact.”
To map them, pass them as a list:
column_mapping.datetime_features = ['last_call_date', 'join_date']
Task parameter for target function
It’s often important to specify whether your Target column is continuous or discrete. This impacts Data Quality, Data Drift, and Target Drift evaluations for the Target column.
To define it explicitly, specify the task parameter:
column_mapping.target = 'y'
column_mapping.task = 'regression'
It accepts the following values:
regression
classification
recsys
(for ranking and recommenders)
Default: If you don't specify the task, Evidently will use a simple strategy: if the target has a numeric type and the number of unique values > 5: task == ‘regression.’ In all other cases, the task == ‘classification’.
Last updated