Write custom code to evaluate your LLM outputs.
grader
function that you can customize. Here’s the basic structure:
grader
function takes in a number of arguments and returns a score between 0 and 1, where 1 means the generated output is perfect. The available arguments are:
messages
: The messages sent to the LLM.tools
: The tools available to the LLM.toolChoice
: The tool choice specified for the LLM.generatedOutput
: The output generated by the LLM which is being evaluated.datasetOutput
: The original dataset output associated with the row being evaluated.generatedOutput
and datasetOutput
to compare the output of the LLM to the dataset output.
Exact Match
Argument Comparison