# Text evals with HuggingFace

{% hint style="info" %}
**You are looking at the old Evidently documentation**: this API is available with versions 0.6.7 or lower. Check the newer docs version [here](https://docs.evidentlyai.com/introduction).
{% endhint %}

**Pre-requisites**:

* You know how to generate Reports or Test Suites for text data using Descriptors.
* You know how to pass custom parameters for Reports or Test Suites.
* You know to specify text data in column mapping.

You can use an external machine learning model to score text data. This method lets you evaluate texts based on any criteria from the source model, e.g. classify it into a set number of labels.

The model you use must return a numerical score or a category for each text in a column. You will then be able to view scores, analyze their distribution or run conditional tests through the usual Descriptor interface.

Evidently supports using HuggingFace models: use the general `HuggingFaceModel()` descriptor to select models on your own or simplified interfaces like `HuggingFaceToxicityModel()`.

## Code example

You can refer to an end-to-end example with different Descriptors:

{% embed url="<https://github.com/evidentlyai/evidently/blob/ad71e132d59ac3a84fce6cf27bd50b12b10d9137/examples/how_to_questions/how_to_evaluate_llm_with_text_descriptors.ipynb>" %}

To import the Descriptor:

```python
from evidently.descriptors import HuggingFaceModel, HuggingFaceToxicityModel
```

To get a Report with a Toxicity score for the `response` column:

```python
report = Report(metrics=[
    TextEvals(column_name="response", descriptors=[
        HuggingFaceToxicityModel(toxic_label="hate"),
    ])
])
```

To get a Report with with several different scores using the general `HuggingFaceModel()` descriptor:

```python
report = Report(metrics=[
    TextEvals(column_name="response", descriptors=[
        HuggingFaceModel(model="DaNLP/da-electra-hatespeech-detection", display_name="Response Toxicity"),
        HuggingFaceModel(model="SamLowe/roberta-base-go_emotions", params={"label": "disappointment"}, 
                         display_name="Disappointments in Response"), 
        HuggingFaceModel(model="SamLowe/roberta-base-go_emotions", params={"label": "optimism"}, 
                         display_name="Optimism in Response"),     
    ])
])
```

You can do the same for Test Suites.

{% hint style="info" %}
**Which descriptors are there?** See the list of available built-in descriptors in the [All Metrics](https://docs-old.evidentlyai.com/reference/all-metrics) page.
{% endhint %}

## Sample models

Here are some example models you can call using the `HuggingFaceModel()` descriptor.

| Model                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Parameters                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <p><strong>Emotion classification</strong><br><code>SamLowe/roberta-base-go\_emotions</code></p><ul><li>Scores texts by 28 emotions.</li><li>Returns the predicted probability for the chosen emotion label.</li><li>Scale: 0 to 1.</li><li><code>toxic\_label="hate"</code> (default)</li></ul><p><strong>Example use</strong>:<br><code>HuggingFaceModel(model="SamLowe/roberta-base-go\_emotions", params={"label": "disappointment"})</code><br><br><strong>Source</strong>: <a href="https://huggingface.co/SamLowe/roberta-base-go_emotions">HuggingFace Model</a></p>                                                                                                                                                                                                                                                                                                | <p><strong>Required</strong>:</p><ul><li><code>params={"label":"label"}</code></li></ul><p><strong>Available labels</strong>:</p><ul><li>admiration</li><li>amusement</li><li>anger</li><li>annoyance</li><li>approval</li><li>caring</li><li>confusion</li><li>curiosity</li><li>desire</li><li>disappointment</li><li>disapproval</li><li>disgust</li><li>embarrassment</li><li>excitement</li><li>fear</li><li>gratitude</li><li>grief</li><li>joy</li><li>love</li><li>nervousness</li><li>optimism</li><li>pride</li><li>realization</li><li>relief</li><li>remorse</li><li>sadness</li><li>surprise</li><li>neutral</li></ul><p><strong>Optional</strong>:</p><ul><li><code>display\_name="display name"</code></li></ul> |
| <p><strong>Toxicity detection</strong><br><code>facebook/roberta-hate-speech-dynabench-r4-target</code></p><ul><li>Detects hate speech.</li><li>Returns predicted probability for the “hate” label.</li><li>Scale: 0 to 1.</li></ul><p><strong>Example use</strong>:<br><code>HuggingFaceModel(model="facebook/roberta-hate-speech-dynabench-r4-target", display\_name="Toxicity")</code><br><br><strong>Source</strong>: <a href="https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target">HuggingFace Model</a></p>                                                                                                                                                                                                                                                                                                                                       | <p><strong>Optional</strong>:</p><ul><li><code>toxic\_label="hate"</code> (default)</li><li><code>display\_name="display name"</code></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| <p><strong>Zero-shot classification</strong><br><code>MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli</code></p><ul><li>A natural language inference model.</li><li>Use it for zero-shot classification by user-provided topics.</li><li>List candidate topics as <code>labels</code>. You can provide one or several topics.</li><li>You can set a classification threshold: if the predicted probability is below, an "unknown" label will be assigned.</li><li>Returns a label.</li></ul><p><strong>Example use</strong>:<br><code>HuggingFaceModel(model="MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli", params={"labels": \["HR", "finance"], "threshold":0.5}, display\_name="Topic")</code><br><br><strong>Source</strong>: <a href="https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli">HuggingFace Model</a></p> | <p><strong>Required</strong>:</p><ul><li><code>params={"labels": \["label"]}</code></li></ul><p><strong>Optional</strong>:</p><ul><li><code>params={"score\_threshold": 0.7}</code> (default: 0.5)</li><li><code>display\_name="display name"</code></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| <p><strong>GPT-2 text detection</strong><br><code>openai-community/roberta-base-openai-detector</code></p><ul><li>Predicts if a text is Real or Fake (generated by a GPT-2 model).</li><li>You can set a classification threshold: if the predicted probability is below, an "unknown" label will be assigned.</li><li>Note that it is not usable as a detector for more advanced models like ChatGPT.</li><li>Returns a label.</li></ul><p><strong>Example use</strong>:<br><code>HuggingFaceModel(model="openai-community/roberta-base-openai-detector", params={"score\_threshold": 0.7})</code><br><br><strong>Source</strong>: <a href="https://huggingface.co/openai-community/roberta-base-openai-detector">HuggingFace Model</a></p>                                                                                                                                | <p><strong>Optional</strong>:</p><ul><li><code>params={"score\_threshold": 0.7}</code> (default: 0.5)</li><li><code>display\_name="display name"</code></li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |

This list is not exhaustive, and the Descriptor may support other models published on Hugging Face. The implemented interface generally works for models that:

* Output a single number (e.g., predicted score for a label) or a label, **not** an array of values.
* Can process raw text input directly.
* Name labels using `label` or `labels` fields.
* Use methods named `predict` or `predict_proba` for scoring.

However, since each model is implemented differently, we cannot provide a complete list of models with a compatible interface. We suggest testing the implementation on your own using trial and error. If you discover useful models, feel free to share them with the community in Discord. You can also open an issue on GitHub to request support for a specific model.
