Run a Test Suite
How to run Test Suites using Evidently Python library.
Last updated
How to run Test Suites using Evidently Python library.
Last updated
Check the for examples of how to generate Test Suites.
After , import the TestSuite
component and the necessary test_presets
or tests
you plan to use:
Here is the general flow.
Input data. Prepare data as a Pandas DataFrame. This will be your current
data to test. You may also pass a reference
dataset to generate Test conditions from this reference or run data distribution Tests. Check the .
Schema mapping. Define your data schema using . Optional, but highly recommended.
Define the Test Suite. Create a TestSuite
object and pass the selected tests
.
Set the parameters. Optionally, specify Test conditions and mark certain Tests as non-critical.
Run the Test Suite. Execute the Test Suite on your current_data
. If applicable, pass the reference_data
and column_mapping
.
Get the results. View the results in Jupyter notebook, export the summary, or send to the Evidently Platform.
You can use Test Presets or create your Test Suite.
Test Presets are pre-built Test Suites that generate Tests for a specific aspect of the data or model performance.
Evidently also automatically generates Test conditions in two ways:
Based on the reference dataset. If you provide a reference
, Evidently derives conditions from it. For example, the TestShareOfOutRangeValues
will fail if over 10% of current
values fall outside the min-max range seen in the reference. 10% is an encoded heuristic.
Based on heuristics. Without a reference, Evidently uses heuristics. For example, TestAccuracyScore()
fails if the model performs worse than a dummy model created by Evidently. Data quality Tests like TestNumberOfEmptyRows()
or TestNumberOfMissingValues()
assume both should be zero.
Example 1. To apply the DataQualityTestPreset
to a single curr
dataset, with conditions generated based on heuristics:
To get the visual report with Test results, call the object in Jupyter notebook or Colab:
To get the Test results summary, generate a Python dictionary:
Example 2. To apply the DataStabilityTestPreset
, with conditions generated from reference, pass the reference_data
:
Example 3. To apply the NoTargetPerformanceTestPreset
with additional parameters:
By selecting specific columns for the Preset, you reduce the number of generated column-level Tests. When you specify the data drift detection method and threshold, it will override the defaults.
You can use Presets as a starting point, but eventually, you'll want to design a Test Suite to pick specific Tests and set conditions more precisely. Hereβs how:
Choose individual Tests. Select the Tests you want to include in your Test Suite.
Pass Test parameters. Set custom parameters for applicable Tests. (Optional).
Set custom conditions. Define when Tests should pass or fail. (Optional).
Mark Test criticality. Mark non-critical Tests to give a Warning instead of Fail. (Optional).
First, decide which Tests to include. Tests can be either dataset-level or column-level.
Dataset-level Tests. Some Tests apply to the entire dataset, such as checking the share of drifting features or accuracy. To add them to a Test Suite, create a TestSuite
object and list the tests
one by one:
Column-level Tests. Some Tests focus on individual columns, like checking if a specific column's values stay within a range. To include column-level Tests, pass the name of the column to each Test:
Combining Tests. You can combine column-level and dataset-level Tests in a single Test Suite. You can also include Presets and individual Tests together.
Tests can have optional or required parameters.
Example 1. To test a quantile value, you need to specify the quantile (Required parameter):
Example 2: To override the default drift detection method, pass the chosen statistical method (Optional), or modify the Mean Value Test to use 3 sigmas:
Example 3: To change the decision threshold for probabilistic classification to 0.8:
You can set up your Test conditions in two ways:
Automatic. If you donβt specify individual conditions, the defaults (reference or heuristic-based) will apply, just like in Test Presets.
Manual. You can define when exactly a Test should pass or fail. For example, set a lower boundary for the expected model precision. If the condition is violated, the Test fails.
You can mix both approaches in the same Test Suite, where some Tests run with defaults and others with custom conditions.
Use the following parameters to set Test conditions:
eq: val
equal
test_result == val
TestColumnValueMin("col", eq=5)
not_eq: val
not equal
test_result != val
TestColumnValueMin("col", not_eq=0)
gt: val
greater than
test_result > val
TestColumnValueMin("col", gt=5)
gte: val
greater than or equal
test_result >= val
TestColumnValueMin("col", gte=5)
lt: val
less than
test_result < val
TestColumnValueMin("col", lt=5)
lte: val
less than or equal
test_result <= val
TestColumnValueMin("col", lte=5)
is_in: list
test_result ==
one of the values
TestColumnValueMin("col", is_in=[3,5,7])
not_in: list
test_result !=
any of the values
TestColumnValueMin("col", not_in=[-1,0,1])
Example 1. To Test that no values are out of range, and less than (lt
) 20% of values are missing:
Example 2. You can specify both the Test condition and parameters together.
In the example above, Evidently automatically derives the feature range from the reference. You can also manually set the range (e.g., between 2 and 10). The Test fails if any value is out of this range:
Example 3. To Test that the precision and recall is over 90%, with a set decision for the classification model:
If you want to set an upper and/or lower limit to the value, you can use approx instead of calculating the value itself. You can set the relative or absolute range.
To use approx
, first import this component:
Example 1. Here is how you can set the upper boundary as 5+10%:
Example 2. Here is how you can set the boundary as 5 +/-10%:
By default, all Tests will return a Fail
if the Test condition is not fulfilled. If you want to get a Warning
instead, use the is_critical
parameter and set it to False
. Example:
Notebook example on setting Test criticality:
Reference: Check the default Test conditions in the table.
Available Test Presets. There are others: for example, DataStabilityTestPreset
, DataDriftTestPreset
or RegressionTestPreset
. See all . For interactive preview, check .
There are more output formats! You can also export the results in formats like HTML, JSON, dataframe, and more. Refer to the for details.
Refer to the table to see available parameters and defaults for each Test and Test Preset.
Reference: see table. To see interactive examples, refer to the .
Row-level evaluations: To Test row-level scores for text data, read more about .
Generating many column-level Tests: To simplify listing many Tests at once, use the .
Reference: you can browse available Test parameters and defaults in the table.