Quickstart - LLM evaluations
LLM evaluation "Hello world."
Last updated
LLM evaluation "Hello world."
Last updated
This quickstart shows how to evaluate text data, such as inputs and outputs from your LLM system.
You will run evals locally in Python and send results to Evidently Cloud for analysis and monitoring.
Need help? Ask on .
Set up your Evidently Cloud workspace:
Sign up for a free .
Create an Organization when you log in for the first time. Get an ID of your organization. .
Get your API token. Click the Key icon in the left menu. Generate and save the token. ().
Now, switch to your Python environment.
Install the Evidently Python library:
Import the components to run the evals:
Import the components to connect with Evidently Cloud:
Connect to Evidently Cloud using your API token:
Create a Project within your Organization:
Prepare your data as a pandas dataframe with texts and metadata columns. Here’s a toy chatbot dataset with "Questions" and "Answers".
You have two options:
Run evals that work locally.
Use LLM-as-a-judge (requires an OpenAI token).
Define your evals. You will evaluate all "Answers" for:
Sentiment: from -1 for negative to 1 for positive.
Text length: character count.
Presence of "sorry" or "apologize": True/False.
Each evaluation is a descriptor
. You can choose from multiple built-in evaluations or create custom ones, including LLM-as-a-judge.
Upload the Report and include raw data for detailed analysis:
You will see the scores summary, and the dataset with new descriptor columns. For example, you can sort to find all answers with "Denials".
Go to the "Dashboard" tab and enter the "Edit" mode. Add a new tab, and select the "Descriptors" template.
You'll see a set of panels that show descriptor values. Each has a single data point. As you log ongoing evaluation results, you can track trends and set up alerts.
Explore the full tutorial for advanced workflows: custom LLM judges, conditional test suites, monitoring, and more.
Collecting live data: use the open-source tracely
library to collect the inputs and outputs from your LLM app. Check the . You can then download the traced dataset for evaluation.
Set the OpenAI key. It's best to set an environment variable: for tips.
View the Report. Go to , open your Project, and navigate to "Reports" in the left.