Criteria
Align LLM judgements with human ratings to evaluate and improve your models.
Criteria are currently in beta. Talk to the OpenPipe team (hello@openpipe.ai) to get access.
Criteria are a simple way to reliably detect and correct mistakes in LLM output. Criteria can currently be used when defining LLM evaluations, and will soon be integrated into the dataset relabeling flow to improve data quality before training a model.
What is a Criterion?
A criterion is a combination of an LLM model and prompt that can be used to identify a specific issue with a model’s output. Criterion judgements are generated
by passing the input and output of a single row along with the criterion prompt to an LLM model, which then returns a binary PASS
/FAIL
judgement.
To learn how to create your first criterion, read the Quick Start.