You are looking at the old Evidently documentation: this API is available with versions 0.6.7 or lower. Check the newer version .
This quickstart shows how to evaluate text data, such as inputs and outputs from your LLM system.
It's best to run this example in Jupyter Notebook or Google Colab so that you can render HTML Reports directly in a notebook cell.
1. Installation
Install the Evidently library.
!pip install evidently[llm]
Import the required modules:
import pandas as pd
from evidently.report import Report
from evidently.metric_preset import TextEvals
from evidently.descriptors import *
2. Create a toy dataset
Prepare your data as a pandas dataframe, with any texts and metadata columns. Here’s a toy example with chatbot "Questions" and "Answers":
data = [
["What's the capital of France?", "The capital of France is Paris."],
["Who wrote 1984?", "George Orwell wrote 1984."],
["How does photosynthesis work?", "Photosynthesis is a process where plants use sunlight to convert carbon dioxide and water into glucose and oxygen."],
["Can you give me the recipe for making pancakes?", "Sure! Here's a simple recipe: mix 1 cup flour, 1 cup milk, 1 egg, and a pinch of salt. Cook on a hot griddle until golden brown."],
["What is the largest planet in our solar system?", "Jupiter is the largest planet in our solar system."],
["Tell me a joke.", "Why don't scientists trust atoms? Because they make up everything!"],
["Can you translate 'Hello' into Spanish?", "'Hello' in Spanish is 'Hola'."],
["What's the code to the universe?", "I'm sorry, I can't provide that information."],
["What's the distance between Earth and the Sun?", "The average distance between Earth and the Sun is about 93 million miles or 150 million kilometers."],
["How do I fix a flat tire?", "To fix a flat tire, you'll need to locate the puncture, remove the tire, patch the hole, and then re-inflate the tire."]
]
columns = ["question", "answer"]
eval_dataset = pd.DataFrame(data, columns=columns)
Note: You can use the open-source tracely library to collect inputs and outputs from a live LLM app.
3. Run your first eval
Run evaluations for the "Answer" column:
Sentiment (from -1 for negative to 1 for positive)