Importing an Alignment Set
You can import an alignment set from either an OpenPipe dataset or a JSONL file. Alignment sets can be added to an existing criterion or imported when a new criterion is created.
Importing from a Dataset
When importing from a dataset, you select a number of rows to be randomly sampled from the dataset of your choice to imported into the criterion alignment set. The inputs of each of these rows will be copied directly from the rows in the dataset without any changes. By default, the outputs will also be copied from the original dataset. However, if you set Output Source to be an LLM model, the outputs will be generated by the LLM model based on the dataset inputs.
Importing from a JSONL File
You can also import an alignment set from a JSONL file. Uploads are limited to 10MB in size, which should be plenty for an alignment set.
judgement
field for each row. judgement
can be either PASS
or FAIL
, depending on whether the row should pass or fail the criterion.
Example
Alignment Stats
Alignment stats are a simple way to understand how well your criterion is performing. As you refine your criterion prompt, you’re alignment stats will improve as well.
- Precision indicates the fraction of rows that the LLM judge labeled as failing that a human judge also labeled as failing. It’s an indicator of how reliable the LLM judge’s FAIL label is.
- Recall indicates the fraction of rows that a human judge labeled as failing that the LLM judge also labeled as failing. It’s an indicator of how reliable the LLM judge’s PASS label is.
- F1 Score is the harmonic mean of precision and recall. As either score improves, the F1 score will also improve.