LogoLogo
HomeBlogGitHub
latest
latest
  • New DOCS
  • What is Evidently?
  • Get Started
    • Evidently Cloud
      • Quickstart - LLM tracing
      • Quickstart - LLM evaluations
      • Quickstart - Data and ML checks
      • Quickstart - No-code evaluations
    • Evidently OSS
      • OSS Quickstart - LLM evals
      • OSS Quickstart - Data and ML monitoring
  • Presets
    • All Presets
    • Data Drift
    • Data Quality
    • Target Drift
    • Regression Performance
    • Classification Performance
    • NoTargetPerformance
    • Text Evals
    • Recommender System
  • Tutorials and Examples
    • All Tutorials
    • Tutorial - Tracing
    • Tutorial - Reports and Tests
    • Tutorial - Data & ML Monitoring
    • Tutorial - LLM Evaluation
    • Self-host ML Monitoring
    • LLM as a judge
    • LLM Regression Testing
  • Setup
    • Installation
    • Evidently Cloud
    • Self-hosting
  • User Guide
    • 📂Projects
      • Projects overview
      • Manage Projects
    • 📶Tracing
      • Tracing overview
      • Set up tracing
    • 🔢Input data
      • Input data overview
      • Column mapping
      • Data for Classification
      • Data for Recommendations
      • Load data to pandas
    • 🚦Tests and Reports
      • Reports and Tests Overview
      • Get a Report
      • Run a Test Suite
      • Evaluate Text Data
      • Output formats
      • Generate multiple Tests or Metrics
      • Run Evidently on Spark
    • 📊Evaluations
      • Evaluations overview
      • Generate snapshots
      • Run no code evals
    • 🔎Monitoring
      • Monitoring overview
      • Batch monitoring
      • Collector service
      • Scheduled evaluations
      • Send alerts
    • 📈Dashboard
      • Dashboard overview
      • Pre-built Tabs
      • Panel types
      • Adding Panels
    • 📚Datasets
      • Datasets overview
      • Work with Datasets
    • 🛠️Customization
      • Data drift parameters
      • Embeddings drift parameters
      • Feature importance in data drift
      • Text evals with LLM-as-judge
      • Text evals with HuggingFace
      • Add a custom text descriptor
      • Add a custom drift method
      • Add a custom Metric or Test
      • Customize JSON output
      • Show raw data in Reports
      • Add text comments to Reports
      • Change color schema
    • How-to guides
  • Reference
    • All tests
    • All metrics
      • Ranking metrics
    • Data drift algorithm
    • API Reference
      • evidently.calculations
        • evidently.calculations.stattests
      • evidently.metrics
        • evidently.metrics.classification_performance
        • evidently.metrics.data_drift
        • evidently.metrics.data_integrity
        • evidently.metrics.data_quality
        • evidently.metrics.regression_performance
      • evidently.metric_preset
      • evidently.options
      • evidently.pipeline
      • evidently.renderers
      • evidently.report
      • evidently.suite
      • evidently.test_preset
      • evidently.test_suite
      • evidently.tests
      • evidently.utils
  • Integrations
    • Integrations
      • Evidently integrations
      • Notebook environments
      • Evidently and Airflow
      • Evidently and MLflow
      • Evidently and DVCLive
      • Evidently and Metaflow
  • SUPPORT
    • Migration
    • Contact
    • F.A.Q.
    • Telemetry
    • Changelog
  • GitHub Page
  • Website
Powered by GitBook
On this page
  • Code example
  • Column Mapping
  • Multiclass classification
  • Multiclass probabilistic classification
  • Binary classification
  • Binary probabilistic classification
  1. User Guide
  2. Input data

Data for Classification

How to define the data schema for classification.

PreviousColumn mappingNextData for Recommendations

Last updated 2 months ago

You are looking at the old Evidently documentation: this API is available with versions 0.6.7 or lower. Check the newer version .

To evaluate classification model performance, you must correctly map the input data schema.

Code example

Column Mapping

To evaluate the classification performance, you need both true labels and prediction. Depending on the classification type (e.g., binary, multi-class, probabilistic), you have different options of how to pass the predictions.

Multiclass classification

Option 1

Target: encoded labels, Preds: encoded labels + Optional[target_names].

target
prediction

1

1

0

2

…

…

2

2

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = 'prediction'
column_mapping.target_names = ['Setosa', 'Versicolour', 'Virginica']

If you pass the target names, they will appear on the visualizations.

You can also pass the target names as a dictionary:

column_mapping.target_names = {'0':'Setosa', '1':'Versicolor', '2':'Virginica'}

or

column_mapping.target_names = {0:'Setosa', 1:'Versicolor', 2:'Virginica'} 

Option 2

Target: labels, Preds: labels.

target
prediction

‘Versicolour’

‘Versicolour’

‘Setosa’

‘Virginica’

…

…

‘Virginica’

‘Virginica’

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = 'prediction'

Multiclass probabilistic classification

Target: labels, Preds: columns named after labels.

target
‘Versicolour’
‘Setosa’
‘Virginica’

‘Setosa’

0.98

0.01

0.01

‘Virginica’

0.5

0.2

0.3

…

…

‘Virginica’

0.2

0.7

0.1

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = ['Setosa', 'Versicolour', 'Virginica']

Naming the columns after the labels is a requirement. You cannot pass a custom list.

Binary classification

Option 1

Target: encoded labels, Preds: encoded labels + pos_label + Optional[target_names]

target
prediction

1

1

0

1

…

…

1

0

By default, Evidently expects the positive class to be labeled as ‘1’. If you have a different label, specify it explicitly.

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = 'prediction'
column_mapping.target_names = ['churn', 'not_churn']
column_mapping.pos_label = 0

If you pass the target names, they will appear on the visualizations.

Option 2

Target: labels, Preds: labels + pos_label

target
prediction

‘churn’

‘churn’

‘not_churn’

‘churn’

…

…

‘churn’

‘not_churn’

Passing the name of the positive class is a requirement in this case.

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = 'prediction'
column_mapping.pos_label = 'churn'

Binary probabilistic classification

Option 1

Target: labels, Preds: columns named after labels + pos_label

target
‘churn’
‘not_churn’

‘churn’

0.9

0.1

‘churn’

0.7

0.3

…

…

‘not_churn’

0.5

0.5

Passing the name of the positive class is a requirement in this case.

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = ['churn', 'not_churn']
column_mapping.pos_label = 'churn'

Option 2

Target: labels, Preds: a column named like one of the labels + pos_label

target
‘not_churn’

‘churn’

0.5

‘not_churn’

0.1

…

…

‘churn’

0.9

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = 'not_churn'
column_mapping.pos_label = 'churn'

Both naming the column after one of the labels and passing the name of the positive class are requirements.

Option 3

Target: encoded labels, Preds: one column with any name + pos_label

target
prediction

1

0.5

1

0.1

…

…

0

0.9

column_mapping = ColumnMapping()

column_mapping.target = 'target'
column_mapping.prediction = 'prediction'
column_mapping.pos_label = 1
column_mapping.target_names = ['churn', 'not_churn']

If you pass the target names, they will appear on the visualizations.

🔢
here
evidently/examples/how_to_questions/how_to_use_column_mapping.ipynb at main · evidentlyai/evidentlyGitHub
Logo