DVCLive can be used to track the results ofEvidently. In the following we demonstrate it
through an example.
How it works
Evidently calculates a rich set of metrics and statistical tests. You can choose any of the pre-built reports or combine individual metrics to define what you want to measure. For example, you can evaluate prediction drift and feature drift together.
You can then generate the calculation output in a Python dictionary format. You should explicitly define which parts of the output to send to DVCLive Tracking.
Step 1. Install DVCLive and Evidently
Install DVCLive, Evidently, and pandas (to handle the data) in your Python environment:
Step 2. Load the data
Load thedata from UCI repository
and save it locally. For demonstration purposes, we treat this data as the input
data for a live model. To use with production models, you should make your
prediction logs available.
This is how it looks:
Step 3. Define column mapping
You should specify the categorical and numerical features so that Evidently
performs the correct statistical test for each of them. While Evidently can
parse the data structure automatically, manually specifying the column type can
minimize errors.
Step 4. Define what to log
Specify which metrics you want to calculate. In this case, you can generate the
Data Drift report and log the drift score for each feature.
You can adapt what you want to calculate by selecting a different Preset or
Metric from those available in Evidently.
Step 5. Define the comparison windows
Specify the period that is considered reference: Evidently will use it as the
base for the comparison. Then, you should choose the periods to treat as
experiments. This emulates the production model runs.
Step 6. Run and log experiments in DVC
There are two ways to track the results of Evidently with DVCLive:
you can save the results of each item in the batch in one single experiment
(each experiment corresponds to a git commit), in separate steps
or you can save the result of each item in the batch as a separate experiment
We will demonstrate both, and show you how to inspect the results regardless of
your IDE. However, if you are using VSCode, we recommend using theDVC extension for VS Code
to inspect the results.
Step 7
Option 1. One single experiment
You can then inspect the results using
and inspecting the resulting dvc_plots/index.html, which should look like
this:
In a Jupyter notebook environment, you can show the plots as a cell output
simply by using Live(report="notebook").
Option 2. Multiple experiments
You can the inspect the results using
In a Jupyter notebook environment, you can access the experiments results using
the Python DVC api:
from dvclive import Live
with Live() as live:
for date in experiment_batches:
live.log_param("begin", date[0])
live.log_param("end", date[1])
metrics = eval_drift(
df.loc[df.dteday.between(reference_dates[0], reference_dates[1])],
df.loc[df.dteday.between(date[0], date[1])],
column_mapping=data_columns,
)
for feature in metrics:
live.log_metric(feature[0], round(feature[1], 3))
live.next_step()
$ dvc plots show
from dvclive import Live
for step, date in enumerate(experiment_batches):
with Live() as live:
live.log_param("begin", date[0])
live.log_param("end", date[1])
live.log_param("step", step)
metrics = eval_drift(
df.loc[df.dteday.between(reference_dates[0], reference_dates[1])],
df.loc[df.dteday.between(date[0], date[1])],
column_mapping=data_columns,
)
for feature in metrics:
live.log_metric(feature[0], round(feature[1], 3))