!(function () {
  var reb2b = (window.reb2b = window.reb2b || []);
  if (reb2b.invoked) return;
  reb2b.invoked = true;
  reb2b.methods = ["identify", "collect"];
  reb2b.factory = function (method) {
    return function () {
      var args = Array.prototype.slice.call(arguments);
      args.unshift(method);
      reb2b.push(args);
      return reb2b;
    };
  };
  for (var i = 0; i < reb2b.methods.length; i++) {
    var key = reb2b.methods[i];
    reb2b[key] = reb2b.factory(key);
  }
  reb2b.load = function (key) {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.async = true;
    script.src = "https://s3-us-west-2.amazonaws.com/b2bjsstore/b/" + key + "/reb2b.js.gz";
    var first = document.getElementsByTagName("script")[0];
    first.parentNode.insertBefore(script, first);
  };
  reb2b.SNIPPET_VERSION = "1.0.1";
  reb2b.load("4O7Z0HMXYWNX");
})();


Writing an Evaluation

Create your first head to head evaluation.

Quick Start

Evaluations Quick Start

OpenPipe

 Software engineers and data scientists use OpenPipe's intuitive fine-tuning and monitoring services to decrease the cost and latency of their LLM operations.  You can use OpenPipe to collect and analyze LLM logs, create fine-tuned models, and compare output from multiple models given the same input.

Introduction

OpenPipe Documentation

OpenPipe is a streamlined platform designed to help product-focused teams train specialized LLM models as replacements for slow and expensive prompts.

Overview

Train and compare across a range of the most powerful base models.

Base Models

Get started with OpenPipe in a few quick steps.

Installing the SDK

 Record production data to train and improve your models' performance.

Logging Requests

Logging Anthropic Requests

 Export your past requests as a JSONL file in their raw form.

Exporting Logs

Collect, evaluate, and refine your training data.

Datasets

Create your first dataset and import training data.

Datasets Quick Start

 Search and filter your past LLM requests to inspect your responses and build a training dataset.

Importing Request Logs

 Upload external data to kickstart your fine-tuning process. Use the OpenAI chat fine-tuning format.

Uploading Data

Use powerful models to generate new outputs for your data before training.

Relabeling Data

Exporting Data

Train your first fine-tuned model with OpenPipe.

Fine-Tuning Quick Start

 Fine tune your models on filtered logs or uploaded datasets. Filter by prompt id and exclude requests with an undesirable output.

Webapp

Fine Tuning via Webapp

 Fine tune your models programmatically through our API.

Fine Tuning via API

 Train reward models to judge the quality of LLM responses based on preference data.

Reward Models (Beta)

Direct Preference Optimization (DPO)

Train your first DPO fine-tuned model with OpenPipe.

DPO Quick Start

 Evaluate the quality of your LLMs against one another or independently.

Evaluations

 Write custom code to evaluate your LLM outputs. 

Code Evals

Code Evaluations

 Evaluate your LLM outputs using criteria. 

Criterion Evals

Criterion Evaluations

 Evaluate your LLM outputs against one another using head-to-head evaluations. 

Head-to-Head Evals

Head-to-Head Evaluations

Align LLM judgements with human ratings to evaluate and improve your models.

Criteria

Criteria Quick Start

Use alignment sets to test and improve your criteria.

Alignment Sets

Criterion Alignment Sets

Use the Criteria API for runtime evaluation and offline testing.

API Endpoints

Chat Completions

Proxying to External Models

Anthropic Proxy

Gemini Proxy

 Improve performance and reduce costs by caching previously generated responses.

Caching

Updating Metadata Tags

Decrease input token counts by pruning out chunks of static text.

Pruning Rules

 Safeguard your application against potential failures, timeouts, or instabilities that may occur when using experimental or newly released models.

Fallback

Fallback options

 Learn about serverless, hourly, and dedicated deployments.

Deployments

Deployment Types

External Models

Report

Record request logs from Anthropic models

Report Anthropic

Update tags metadata for logged calls matching the provided filters.

Update Metadata

OpenAI-compatible route for generating inference and optionally logging the request.

Welcome

Getting Started

Features

API Reference

Pricing

Evaluations Quick Start

Writing an Evaluation

Welcome

Getting Started

Features

API Reference

Pricing

​Writing an Evaluation

Writing an Evaluation