> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openpipe.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Criterion Evaluations

>  Evaluate your LLM outputs using criteria. 

<Note>
  Criterion evaluations are useful for evaluating your LLM outputs against a set of criteria. If you
  haven't defined any criteria yet, check out the criteria [Quick
  Start](/features/criteria/quick-start) guide.
</Note>

Criterion evaluations are a reliable way to judge the quality of your LLM outputs according to the criteria you've defined. For each model being evaluated, the output of that model is compared against the criteria you've defined for every entry in the evaluation dataset.

<Frame>
  <img src="https://mintcdn.com/openpipe/yLyh_RHELnvU-7tP/images/features/evaluations/criterion-eval-settings.png?fit=max&auto=format&n=yLyh_RHELnvU-7tP&q=85&s=47b136c45f57a3e0cf03fbb407be629f" alt="" width="2548" height="1578" data-path="images/features/evaluations/criterion-eval-settings.png" />
</Frame>

<Info>
  A criterion evaluation is only as reliable as the criterion you've defined. To improve your
  criterion, check out the [alignment docs](/features/criteria/alignment-set).
</Info>

Each output in the evaluation dataset is compared against the criterion you've defined. The output is then scored as either `PASS` or `FAIL` based on the criterion.

<br />

<Frame>
  <img src="https://mintcdn.com/openpipe/yLyh_RHELnvU-7tP/images/features/evaluations/criterion-eval-results-table.png?fit=max&auto=format&n=yLyh_RHELnvU-7tP&q=85&s=9f4e723c5c8ed589c64c2432ce601d3a" alt="" width="2544" height="974" data-path="images/features/evaluations/criterion-eval-results-table.png" />
</Frame>

<br />

To see why one model might be outperforming another, you can navigate back to the [evaluation table](https://app.openpipe.ai/p/BRZFEx50Pf/datasets/3e7e82c1-b066-476c-9f17-17fd85a2169b/evaluate) and click on a result pill to see the evaluation judge's reasoning.

<br />

<Frame>
  <img src="https://mintcdn.com/openpipe/yLyh_RHELnvU-7tP/images/features/evaluations/criterion-eval-explanation.png?fit=max&auto=format&n=yLyh_RHELnvU-7tP&q=85&s=691da15800edc8179dce1cb990e1988b" alt="" width="3022" height="1716" data-path="images/features/evaluations/criterion-eval-explanation.png" />
</Frame>

<br />

While criterion evaluations are powerful and flexible, they're much more expensive to run than pure code. If your models' outputs can be easily evaluated by code alone, consider using [code evaluations](/features/evaluations/code) instead.
