# Data drift parameters

{% hint style="info" %}
**You are looking at the old Evidently documentation**: this API is available with versions 0.6.7 or lower. Check the newer docs version [here](https://docs.evidentlyai.com/introduction).
{% endhint %}

**Pre-requisites**:

* You know how to generate Reports or Test Suites with default parameters.
* You know how to pass custom parameters for Reports or Test Suites.
* You know how to use Column Mapping to set the input data type.

## Default

All Presets, Tests, and Metrics that include data or target (prediction) drift evaluation use the default [Data Drift algorithm](https://docs-old.evidentlyai.com/reference/data-drift-algorithm). It automatically selects an appropriate drift detection method based on the feature type and volume.

You can override the defaults by passing a custom parameter to the chosen Test, Metric, or Preset. You can define the drift detection method, the threshold, or both.

## Code example

You can refer to an example How-to-notebook showing how to pass custom drift parameters:

{% embed url="<https://github.com/evidentlyai/evidently/blob/ad71e132d59ac3a84fce6cf27bd50b12b10d9137/examples/how_to_questions/how_to_specify_stattest_for_a_testsuite.ipynb>" %}

## Examples

To set a custom drift method and threshold on the **column level**:

```python
ColumnDriftMetric(column_name='feature1', stattest='wasserstein', stattest_threshold=0.2) 
```

If you have a Preset, Test or Metric that checks for drift in **multiple columns** at the same time, you can set a custom drift method for all columns, all numerical/categorical columns, or for each column individually.

Here is how you set the drift detection method for all categorical columns:

```python
DataDriftPreset(cat_stattest='ks', cat_statest_threshold=0.05)
```

To set a custom condition for the **dataset drift** (share of drifting columns in the dataset) in the relevant Metrics or Presets:

```python
DatasetDriftMetric(drift_share=0.7)
```

Note that this works slightly differently for Tests. To set a custom condition for the **dataset drift** when you run a relevant **Test**, you should set a condition for the share of drifted features using standard `lt` and `gt` parameters:

```python
TestShareOfDriftedColumns(lt=0.5)
```

When you set drift threshold for `ColumnDriftTest()`, you should use `stattest_threshold` and other parameters the same way as it works in Metrics (not `lt` and `gt`).

## Tabular drift detection

The following methods and parameters apply to **tabular** data (as parsed automatically or specified as numerical or categorical columns in the column mapping).

### Drift parameters - Tabular

The following drift detection parameters are available in the `DataDriftTable()`, `DatasetDriftMetric()`, `ColumnDriftMetric()`, related Tests, and Presets that contain them.

| Parameter                                                                                  | Description                                                                                                                                                                                                               |
| ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `stattest`                                                                                 | Defines the drift detection method for a given column (if a single column is tested), or all columns in the dataset (if multiple columns are tested).                                                                     |
| `stattest_threshold`                                                                       | <p>Sets the drift threshold in a given column or all columns.<br>The threshold meaning varies based on the drift detection method, e.g., it can be the value of a distance metric or a p-value of a statistical test.</p> |
| `drift_share`                                                                              | Defines the share of drifting columns as a condition for Dataset Drift in `DatasetDriftMetric` or inside a Preset.                                                                                                        |
| <p><code>cat\_stattest</code><br><code>cat\_stattest\_threshold</code></p>                 | Sets the drift method and/or threshold for all categorical columns in the dataset.                                                                                                                                        |
| <p><code>num\_stattest</code><br><code>num\_stattest\_threshold</code></p>                 | Sets the drift method and/or threshold for all numerical columns in the dataset.                                                                                                                                          |
| <p><code>per\_column\_stattest</code><br><code>per\_column\_stattest\_threshold</code></p> | Sets the drift method and/or threshold for the listed columns (accepts a dictionary).                                                                                                                                     |

{% hint style="info" %}
**How to check available parameters.** You can verify which parameters are available for a specific test, metric, or preset in the [All tests](https://docs-old.evidentlyai.com/reference/all-tests) or [All metrics](https://docs-old.evidentlyai.com/reference/all-metrics) tables or consult the [API reference](https://github.com/evidentlyai/docs-old/blob/main/customization/\[../reference/api-reference]\(https:/docs.evidentlyai.com/reference/api-reference\)/README.md)
{% endhint %}

### Drift detection methods - Tabular

To use the following drift detection methods, pass them using the `stattest` parameter.

| StatTest                                                         | Applicable to                                                                                                                      | Drift score                                                                                                                               |
| ---------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| <p><code>ks</code><br>Kolmogorov–Smirnov (K-S) test</p>          | <p>tabular data<br>only numerical<br><br><strong>Default method for numerical data, if <= 1000 objects</strong></p>                | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>chisquare</code><br>Chi-Square test</p>                 | <p>tabular data<br>only categorical<br><br><strong>Default method for categorical with > 2 labels, if <= 1000 objects</strong></p> | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>z</code><br>Z-test</p>                                  | <p>tabular data<br>only categorical<br><br><strong>Default method for binary data, if <= 1000 objects</strong></p>                 | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>wasserstein</code><br>Wasserstein distance (normed)</p> | <p>tabular data<br>only numerical<br><br><strong>Default method for numerical data, if > 1000 objects</strong></p>                 | <p>returns <code>distance</code><br>drift detected when <code>distance</code> >= <code>threshold</code><br>default threshold: 0.1</p>     |
| <p><code>kl\_div</code><br>Kullback-Leibler divergence</p>       | <p>tabular data<br>numerical and categorical</p>                                                                                   | <p>returns <code>divergence</code><br>drift detected when <code>divergence</code> >= <code>threshold</code><br>default threshold: 0.1</p> |
| <p><code>psi</code><br>Population Stability Index (PSI)</p>      | <p>tabular data<br>numerical and categorical</p>                                                                                   | <p>returns <code>psi\_value</code><br>drift detected when <code>psi\_value</code> >= <code>threshold</code><br>default threshold: 0.1</p> |
| <p><code>jensenshannon</code><br>Jensen-Shannon distance</p>     | <p>tabular data<br>numerical and categorical<br><br><strong>Default method for categorical, if > 1000 objects</strong></p>         | <p>returns <code>distance</code><br>drift detected when <code>distance</code> >= <code>threshold</code><br>default threshold: 0.1</p>     |
| <p><code>anderson</code><br>Anderson-Darling test</p>            | <p>tabular data<br>only numerical</p>                                                                                              | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>fisher\_exact</code><br>Fisher's Exact test</p>         | <p>tabular data<br>only categorical</p>                                                                                            | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>cramer\_von\_mises</code><br>Cramer-Von-Mises test</p>  | <p>tabular data<br>only numerical</p>                                                                                              | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>g-test</code><br>G-test</p>                             | <p>tabular data<br>only categorical</p>                                                                                            | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>hellinger</code><br>Hellinger Distance (normed)</p>     | <p>tabular data<br>numerical and categorical</p>                                                                                   | <p>returns <code>distance</code><br>drift detected when <code>distance</code> >= <code>threshold</code><br>default threshold: 0.1</p>     |
| <p><code>mannw</code><br>Mann-Whitney U-rank test</p>            | <p>tabular data<br>only numerical</p>                                                                                              | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>ed</code><br>Energy distance</p>                        | <p>tabular data<br>only numerical</p>                                                                                              | <p>returns <code>distance</code><br>drift detected when <code>distance</code> >= <code>threshold</code><br>default threshold: 0.1</p>     |
| <p><code>es</code><br>Epps-Singleton tes</p>                     | <p>tabular data<br>only numerical</p>                                                                                              | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>t\_test</code><br>T-Test</p>                            | <p>tabular data<br>only numerical</p>                                                                                              | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>empirical\_mmd</code><br>Empirical-MMD</p>              | <p>tabular data<br>only numerical</p>                                                                                              | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |
| <p><code>TVD</code><br>Total-Variation-Distance</p>              | <p>tabular data<br>only categorical</p>                                                                                            | <p>returns <code>p\_value</code><br>drift detected when <code>p\_value</code> < <code>threshold</code><br>default threshold: 0.05</p>     |

## Text drift detection

Text drift detection applies to columns with **raw text data**, as specified in column mapping.

{% hint style="info" %}
**Embedding drift detection.** If you work with embeddings, you can use [Embeddings Drift Detection methods](https://docs-old.evidentlyai.com/user-guide/customization/embeddings-drift-parameters).
{% endhint %}

### Drift parameters - Text

The following text drift detection parameters are available in the `DataDriftTable()`, `DatasetDriftMetric()`, `ColumnDriftMetric()`, related Tests and Presets that contain them.

| Parameter                 | Description                                                                                                                                        |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `stattest`                | Defines the drift detection method for a given column that contains text data, or for all columns in the dataset if all columns contain text data. |
| `stattest_threshold`      | Sets the threshold as a drift detection parameter.                                                                                                 |
| `text_stattest`           | Defines the drift detection method for all text columns in the dataset.                                                                            |
| `text_stattest_threshold` | Sets the threshold as a drift detection parameter.                                                                                                 |

### Drift detection methods - Text

To use the following text drift detection methods, pass them using the `stattest` parameter.

| StatTest                                                                                                                      | Description                                                                                                                                                                                        | Drift score                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| ----------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <p><code>perc\_text\_content\_drift</code><br>Text content drift (domain classifier, with statistical hypothesis testing)</p> | <p>Applies only to text data. Trains a classifier model to distinguish between text in “current” and “reference” datasets.<br><br><strong>Default for text data when <= 1000 objects.</strong></p> | <ul><li>returns <code>roc\_auc</code> of the classifier as a <code>drift\_score</code></li><li>drift detected when <code>roc\_auc</code> > possible ROC AUC of the random classifier at a set percentile</li><li><code>threshold</code> sets the percentile of the possible ROC AUC values of the random classifier to compare against</li><li>default threshold: 0.95 (95th percentile)</li><li><code>roc\_auc</code> values can be 0 to 1 (typically 0.5 to 1); a higher value means more confident drift detection</li></ul> |
| <p><code>abs\_text\_content\_drift</code><br>Text content drift (domain classifier)</p>                                       | <p>Applies only to text data. Trains a classifier model to distinguish between text in “current” and “reference” datasets.<br><br><strong>Default for text data when > 1000 objects.</strong></p>  | <ul><li>returns <code>roc\_auc</code> of the classifier as a <code>drift\_score</code></li><li>drift detected when <code>roc\_auc</code> > <code>threshold</code></li><li><code>threshold</code> sets the ROC AUC threshold</li><li>default threshold: 0.55</li><li><code>roc\_auc</code> values can be 0 to 1 (typically 0.5 to 1); a higher value means more confident drift detection</li></ul>                                                                                                                              |

### Text descriptors drift

You can also check for distribution drift in text descriptors (such as text length, etc.)

To use this method, call a separate `TextDescriptorsDriftMetric()`. You can pass any of the tabular drift detection methods as a parameter.

```python
report = Report(metrics=[
    TextDescriptorsDriftMetric("Review_Text"),
])

report.run(reference_data=reviews_ref, current_data=reviews_cur, column_mapping=column_mapping)
report
```
