DPO fine-tuning uses preference data to train models on positive and negative examples. In OpenPipe, DPO can be used as a drop-in replacement for SFT fine-tuning or as a complement to it.

Before you begin: Before building training your first model with DPO, make sure you’ve created a dataset and have collected at least 500 rows of training data on OpenPipe or another platform.

1

Prepare your Dataset

To train a model with DPO, you need pairs of outputs containing preferred and rejected responses. You can prepare this data in one of two ways:

  1. Upload a JSONL file

    Add training rows to your dataset by uploading a JSONL file. Make sure to add a rejected_message field on each row that you’d like to use for preference tuning.

  2. Track Rejected Outputs

    In the Data Pipeline view of your dataset, you can convert original outputs that have been overwritten by either an LLM (through an LLM Relabel node) or human (through a Human Relabel node) into rejected outputs. The original output will be treated as the negative example, and the replacement output will be treated as the positive example.

    LLM Relabel Node

    Human Relabel Node
2

Configure Training Settings

Once your dataset is ready, training a DPO model is similar to training an SFT model.

  1. Select the dataset you prepared for preference tuning.
  2. Adjust the base model.
    • Currently, DPO is only supported on Llama 3.1 8B.
  3. Under Advanced Options, click the Enable Preference Tuning checkbox.
3

Adjust Hyperparameters (optional)

You should now see the number of rows that will be used for supervised fine tuning (SFT Row Count) and preference tuning (Preference Row Count). Rows in your dataset that only include a preferred output will be used for supervised fine tuning, while rows with both preferred and rejected outputs will be used for preference tuning.

Adjust the training job’s hyperparameters if needed. We recommend using the default values if you’re unsure.

4

Start Training

Finally, kick off a training job by clicking the Start Training button.