# Get Model get /models/{model} Get a model by ID. Consult the OpenPipe team before using. # List Models get /models List all models for a project. Consult the OpenPipe team before using. # Chat Completions post /chat/completions OpenAI-compatible route for generating inference and optionally logging the request. # Judge Criteria post /criteria/judge Get a judgement of a completion against the specified criterion # Report post /report Record request logs from OpenAI models # Report Anthropic post /report-anthropic Record request logs from Anthropic models # Update Log Metadata post /logs/update-metadata Update tags metadata for logged calls matching the provided filters. # Base Models Train and compare across a range of the most powerful base models. We regularly evaluate new models to see how they compare against our existing suite. If you'd like us to check out a base model you're particularly excited about, send an email to [hello@openpipe.ai](mailto:hello@openpipe.ai). ## Current Base Models ### Open Source * [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) * [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) * [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) * [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) * [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) * [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) * [mistralai/Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) * [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) * [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) ### OpenAI * [gpt-4o-mini-2024-07-18](https://platform.openai.com/docs/models/gpt-4o-mini) * [gpt-4o-2024-08-06](https://platform.openai.com/docs/models/gpt-4o) * [gpt-3.5-turbo-1106](https://platform.openai.com/docs/models/gpt-3-5-turbo) * [gpt-3.5-turbo-0125](https://platform.openai.com/docs/models/gpt-3-5-turbo) ### Google Gemini * [gemini-1.0-pro-001](https://deepmind.google/technologies/gemini/pro/) * [gemini-1.5-flash-001](https://deepmind.google/technologies/gemini/flash/) ## Enterprise models These models are currently available for enterprise customers only. If you're interested in exploring these models, we'd be happy to discuss further. Please reach out to us at [hello@openpipe.ai](mailto:hello@openpipe.ai) to learn more. ### AWS Bedrock * [cohere.command-text-v14](https://docs.aws.amazon.com/bedrock/latest/userguide/cm-hp-cohere-command.html) * [cohere.command-light-text-v14](https://docs.aws.amazon.com/bedrock/latest/userguide/cm-hp-cohere-command.html) * [anthropic.claude-3-haiku-20240307-v1:0](https://docs.aws.amazon.com/bedrock/latest/userguide/cm-hp-anth-claude-3.html) # Caching Improve performance and reduce costs by caching previously generated responses. When caching is enabled, our service stores the responses generated for each unique request. If an identical request is made in the future, instead of processing the request again, the cached response is instantly returned. This eliminates the need for redundant computations, resulting in faster response times and reduced API usage costs. Caching is currently in a free beta preview. ## Enabling Caching To enable caching for your requests, you can set the `cache` property of the openpipe object to `true`. If you are making requests through our proxy, add the `op-cache` header to your requests: ```bash curl --request POST \ --url https://api.openpipe.ai/api/v1/chat/completions \ --header "Authorization: Bearer YOUR_OPENPIPE_API_KEY" \ --header 'Content-Type: application/json' \ --header 'op-cache: true' \ --data '{ "model": "openpipe:your-fine-tuned-model-id", "messages": [ { "role": "system", "content": "count to 5" } ] }' ``` ```python from openpipe import OpenAI client = OpenAI() completion = client.chat.completions.create( model="openpipe:your-fine-tuned-model-id", messages=[{"role": "system", "content": "count to 5"}], openpipe={ "cache": True }, ) ``` ```typescript import OpenAI from "openpipe/openai"; const openai = new OpenAI(); const completion = await openai.chat.completions.create({ messages: [{ role: "user", content: "count to 5" }], model: "openpipe:your-fine-tuned-model-id", openpipe: { cache: true, }, }); ``` # Anthropic Proxy If you'd like to make chat completion requests to Anthropic models without modifying your prompt schema, you can proxy OpenAI-compatible requests through OpenPipe, and we'll handle the translation for you. To proxy requests to Anthropic models, first add your Anthropic API Key to your project settings. Then, adjust the **model** parameter of your requests to be the name of the model you wish to query, prepended with the string `anthropic:`. For example, to make a request to `claude-3-5-sonnet-20241022`, use the following code: ```python from openpipe import OpenAI # Find the config values in "Installing the SDK" client = OpenAI() completion = client.chat.completions.create( model="anthropic:claude-3-5-sonnet-20241022", messages=[{"role": "system", "content": "count to 10"}], metadata={"prompt_id": "counting", "any_key": "any_value"}, ) ``` ```typescript import OpenAI from "openpipe/openai"; // Find the config values in "Installing the SDK" const client = OpenAI(); const completion = await client.chat.completions.create({ model: "anthropic:claude-3-5-sonnet-20241022", messages: [{ role: "user", content: "Count to 10" }], metadata: { prompt_id: "counting", any_key: "any_value", }, }); ``` For your reference, here is a list of the most commonly used Anthropic models formatted for the OpenPipe proxy: * `anthropic:claude-3-5-sonnet-20241022` * `anthropic:claude-3-opus-20240229` * `anthropic:claude-3-sonnet-20240229` * `anthropic:claude-3-haiku-20240307` Additionally, you can always stay on the latest version of the model by using an abbreviated model name: * `anthropic:claude-3-5-sonnet` * `anthropic:claude-3-opus` * `anthropic:claude-3-sonnet` * `anthropic:claude-3-haiku` If you'd like to make requests directly to Anthropic models, you can do that externally using the Anthropic SDK, and report your logs using the asynchronous [reporting API](/features/request-logs/reporting-anthropic). # Custom External Models Some developers have found it useful to proxy requests to arbitrary external models through OpenPipe. This is useful if you have a custom model that you've deployed to Azure, or an external model that you've deployed to another cloud provider. Adding custom external models is not required to proxy requests to OpenAI or Anthropic models. See our docs on proxying to [OpenAI](/features/request-logs/logging-requests#proxy) or [Anthropic](/features/chat-completions/anthropic) for more information. Proxying chat completions to an custom external model requires a few short steps. * Create an external model provider * Add a model to the external model provider * Adjust the `model` parameter in your chat completion request ### Create an external model provider Find the **External Model Providers** section of your project settings, and click the Add Provider button. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/external-models/add-provider-button.png) Give your custom provider a slug, API key, and add a custom base url if necessary. The slug should be unique, and will be used when we proxy requests to models associated with this provider. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/external-models/add-provider-modal.png) ### Add a model to the external model provider To add a model to the provider you're creating, click the Add model button. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/external-models/add-model-button.png) Give the model a slug that matches the model you'd like to call on your external provider. To call gpt-4o-2024-08-06 on Azure for instance, the slug should be `gpt-4o-2024-08-06`. Setting input cost and output cost is optional, but can be helpful for showing relative costs in the [evals](/features/evaluations) page. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/external-models/add-model-row.png) ### Update the `model` parameter in your chat completion request Almost done! The last step is to set the model parameter in your requests to match this format: `openpipe:/`. For example, if you're calling gpt-4o-2024-08-06 on Azure, the model parameter should be `openpipe:custom-azure-provider/gpt-4o-2024-08-06`. ```python from openpipe import OpenAI # Find the config values in "Installing the SDK" client = OpenAI() completion = client.chat.completions.create( model="openpipe:custom-azure-provider/gpt-4o-2024-08-06", messages=[{"role": "system", "content": "count to 10"}], metadata={"prompt_id": "counting", "any_key": "any_value"}, ) ``` ```typescript import OpenAI from "openpipe/openai"; // Find the config values in "Installing the SDK" const client = OpenAI(); const completion = await client.chat.completions.create({ model: "openpipe:custom-azure-provider/gpt-4o-2024-08-06", messages: [{ role: "user", content: "Count to 10" }], metadata: { prompt_id: "counting", any_key: "any_value", }, }); ``` External models can also be used for filtering and relabeling your data. We currently support custom external models for providers with openai and azure-compatible endpoints. If you'd like support for an external provider with a different API format, send a request to [hello@openpipe.ai](mailto:hello@openpipe.ai). # Mixture of Agents Chat Completions In some cases, completions produced by GPT-4 or other SOTA models aren't good enough to be used in production. To improve quality beyond the limit of SOTA models, we've developed a Mixture of Agents (MoA) technique that enhances quality but also increases cost and latency. To use MoA models, set the **model** parameter to be one of the following: * `openpipe:moa-gpt-4o-v1` * `openpipe:moa-gpt-4-turbo-v1` * `openpipe:moa-gpt-4-v1` To get the highest quality completions, use the MoA model that corresponds to the best-performing SOTA model. For instance, if your original model was `gpt-4-turbo-2024-04-09`, try switching to `openpipe:moa-gpt-4-turbo-v1`. Make sure to set your `OpenAI API Key` in the `Project Settings` page to enable MoA completions! ```python from openpipe import OpenAI # Find the config values in "Installing the SDK" client = OpenAI() completion = client.chat.completions.create( # model="gpt-4-turbo-2024-04-09", - original model model="openpipe:moa-gpt-4-turbo-v1", messages=[{"role": "system", "content": "count to 10"}], metadata={"prompt_id": "counting", "any_key": "any_value"}, ) ``` ```typescript import OpenAI from "openpipe/openai"; // Find the config values in "Installing the SDK" const client = OpenAI(); const completion = await client.chat.completions.create({ // model: "gpt-4-turbo-2024-04-09", - original model model: "openpipe:moa-gpt-4-turbo-v1", messages: [{ role: "user", content: "Count to 10" }], metadata: { prompt_id: "counting", any_key: "any_value", }, }); ``` To learn more, visit the [Mixture of Agents](/features/mixture-of-agents) page. # Chat Completions Once your fine-tuned model is deployed, you're ready to start generating chat completions. First, make sure you've set up the SDK properly. See the [OpenPipe SDK](/getting-started/openpipe-sdk) section for more details. Once the SDK is installed and you've added the right `OPENPIPE_API_KEY` to your environment variables, you're almost done. The last step is to update the model that you're querying to match the ID of your new fine-tuned model. ```python from openpipe import OpenAI # Find the config values in "Installing the SDK" client = OpenAI() completion = client.chat.completions.create( # model="gpt-3.5-turbo", - original model model="openpipe:your-fine-tuned-model-id", messages=[{"role": "system", "content": "count to 10"}], metadata={"prompt_id": "counting", "any_key": "any_value"}, ) ``` ```typescript import OpenAI from "openpipe/openai"; // Find the config values in "Installing the SDK" const client = OpenAI(); const completion = await client.chat.completions.create({ // model: "gpt-3.5-turbo", - original model model: "openpipe:your-fine-tuned-model-id", messages: [{ role: "user", content: "Count to 10" }], metadata: { prompt_id: "counting", any_key: "any_value", }, }); ``` Queries to your fine-tuned models will now be shown in the [Request Logs](/features/request-logs) panel. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/running-inference-logs.png) Feel free to run some sample inference on the [PII Redaction model](https://app.openpipe.ai/p/BRZFEx50Pf/fine-tunes/6076ad69-cce5-4892-ae54-e0549bbe107f/general) in our public project. # Prompt Prefilling Use Prompt Prefilling to control the initial output of the completion. Prompt prefilling is a powerful feature that allows you to control the initial output of your models. This can be particularly useful for maintaining context, structuring outputs, or continuing previous dialogues. ## How It Works To use prompt prefilling, include an assistant message at the end of your input with the following characteristics: * Set the `role` to "assistant" * Set the `name` to "prefill" * Include your desired prefill content in the `content` field Our completions will start their response taking the prefilled content into account, effectively "continuing" from where you left off. ## Example Usage ### Basic Prefilling ```typescript const input = { messages: [ { role: "user", content: "Write a story about a brave knight." }, { role: "assistant", name: "prefill", content: "Once upon a time, in a kingdom far away, there lived a brave knight named", }, ], }; // Model output: // " Lancelot. He rode through the countryside seeking adventures, and wherever he went he..." ``` The response will continue the story from there. ### Structured Output Prefilling can be used to enforce specific output structures: ```typescript const input = { messages: [ { role: "user", content: "List three benefits of exercise." }, { role: "assistant", name: "prefill", content: "Here are three key benefits of regular exercise:\n\n1.", }, ], }; // Model output: // "Improved cardiovascular health\n\n2. Increased muscle strength and endurance\n\n3. Improved mental health and mood" ``` This ensures the response starts with the desired format. ### Maintaining Character in Roleplays For roleplay scenarios, prefilling can help maintain character consistency: ```typescript const input = { messages: [ { role: "system", content: "You are a pirate captain from the 18th century." }, { role: "user", content: "What's our next destination?" }, { role: "assistant", name: "prefill", content: "Arr, me hearty! Our next destination be", }, ], }; // Model output: // " the Caribbean Sea, me hearty! Let's set sail!" ``` ## Notes * Prefilling only works when interacting with our OpenPipe finetuned models. * You can use this feature while finetuning as well, maintaining the same characteristics for the assistant message. By leveraging prompt prefilling, you can create more controlled, consistent, and context-aware interactions in your applications. # Criterion Alignment Sets Use alignment sets to test and improve your criteria. Alignment sets are a collection of LLM input/output pairs that are judged by both the criterion LLM judge and a human. The performance of the criterion LLM judge is then measured by how well it matches the judgements of the human judge. We recommend importing and judging at least 30 rows to ensure the alignment stats are meaningful. ## Importing an Alignment Set You can import an alignment set from either an OpenPipe dataset or a JSONL file. Alignment sets can be added to an existing criterion or imported when a new criterion is created. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/alignment-set/import-alignment-set.png) ### Importing from a Dataset When importing from a dataset, you select a number of rows to be randomly sampled from the dataset of your choice to imported into the criterion alignment set. The inputs of each of these rows will be copied directly from the rows in the dataset without any changes. By default, the outputs will also be copied from the original dataset. However, if you set **Output Source** to be an LLM model, the outputs will be generated by the LLM model based on the dataset inputs. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/alignment-set/import-from-dataset.png) ### Importing from a JSONL File You can also import an alignment set from a JSONL file. Uploads are limited to 10MB in size, which should be plenty for an alignment set. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/alignment-set/import-from-upload.png) The schema of the JSONL file is exactly the same as an OpenAI-compatible [JSONL fine-tuning file](/features/datasets/uploading-data#openai-fields), but also supports an optional `judgement` field for each row. `judgement` can be either `PASS` or `FAIL`, depending on whether the row should pass or fail the criterion. #### Example ```jsonl ... {"judgement": "PASS", "messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is the capital of Tasmania?"},{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"identify_capital","arguments":"{\"capital\":\"Hobart\"}"}}]}],"tools":[{"type":"function","function":{"name":"identify_capital","parameters":{"type":"object","properties":{"capital":{"type":"string"}}}}}]} {"judgement": "FAIL", "messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is the capital of Sweden?"},{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"identify_capital","arguments":"{\"capital\":\"Beijing\"}"}}]}],"tools":[{"type":"function","function":{"name":"identify_capital","parameters":{"type":"object","properties":{"capital":{"type":"string"}}}}}]} {"messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is the capital of Sweden?"},{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"identify_capital","arguments":"{\"capital\":\"Stockholm\"}"}}]}],"tools":[{"type":"function","function":{"name":"identify_capital","parameters":{"type":"object","properties":{"capital":{"type":"string"}}}}}]} ... ``` ## Alignment Stats Alignment stats are a simple way to understand how well your criterion is performing. As you refine your criterion prompt, you're alignment stats will improve as well. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/alignment-set/alignment-stats.png) * **Precision** indicates the fraction of rows that the LLM judge labeled as failing that a human judge also labeled as failing. It's an indicator of how reliable the LLM judge's FAIL label is. * **Recall** indicates the fraction of rows that a human judge labeled as failing that the LLM judge also labeled as failing. It's an indicator of how reliable the LLM judge's PASS label is. * **F1 Score** is the harmonic mean of precision and recall. As either score improves, the F1 score will also improve. To ensure your alignment stats are meaningful, we recommend labeling at least 30 rows, but in some cases you may need to label more in order to get a reliable statistic. # API Endpoints Use the Criteria API for runtime evaluation and offline testing. After you've defined and aligned your judge criteria, you can access them via API endpoints for both runtime evaluation (**Best of N** sampling) and offline testing. ### Runtime Evaluation See the Chat Completion [docs](/features/chat-completions/overview) and [API Reference](/api-reference/post-chatcompletions) for more information on making chat completions with OpenPipe. When making a request to the `/chat/completions` endpoint, you can specify a list of criteria to run immediately after a completion is generated. We recommend generating multiple responses from the same prompt, each of which will be scored by the specified criteria. The responses will be sorted by their combined score across all criteria, from highest to lowest. This technique is known as **[Best of N](https://huggingface.co/docs/trl/en/best_of_n)** sampling. To invoke criteria, add an `op-criteria` header to your request with a list of criterion IDs, like so: ```python from openpipe import OpenAI # Find the config values in "Installing the SDK" client = OpenAI() completion = client.chat.completions.create( model="openai:gpt-4o-mini", messages=[{"role": "system", "content": "count to 10"}], metadata={ "prompt_id": "counting", "any_key": "any_value", }, n=5, extra_headers={"op-criteria": '["criterion-1@v1", "criterion-2"]'}, ) best_response = completion.choices[0] ``` ```typescript import OpenAI from "openpipe/openai"; // Find the config values in "Installing the SDK" const client = OpenAI(); const completion = await client.chat.completions.create({ model: "openai:gpt-4o-mini", messages: [{ role: "user", content: "Count to 10" }], metadata: { prompt_id: "counting", any_key: "any_value", }, n: 5, headers: { "op-criteria": '["criterion-1@v1", "criterion-2"]', }, }); const bestResponse = completion.choices[0]; ``` ```bash curl --request POST \ --url https://app.openpipe.ai/api/v1/chat/completions \ --header "Authorization: Bearer $OPENPIPE_API_KEY" \ --header 'Content-Type: application/json' \ --header 'op-criteria: ["criterion-1@v1", "criterion-2"]' \ --data '{ "model": "openai:gpt-4o-mini", "messages": [ { "role": "user", "content": "Count to 10" }, ], "store": true, "n": 5, "metadata": { "prompt_id": "counting", "any_key": "any_value", } }' ``` Specified criteria can either be versioned, like `criterion-1@v1`, or default to the latest criterion version, like `criterion-2`. In addition to the usual fields, each chat completion choice will now include a `criteria_results` object, which contains the judgements of the specified criteria. The array of completion choices will take the following form: ```json [ { "finish_reason": "stop", "index": 0, "message": { "content": "1, 2, 3.", "refusal": null, "role": "assistant" }, "logprobs": null, "criteria_results": { "criterion-1": { "status": "success", "score": 1, "explanation": "..." }, "criterion-2": { "status": "success", "score": 0.6, "explanation": "..." } } }, { ... } ] ``` ### Offline Testing See the [API Reference](/api-reference/post-criteriajudge) for more details. To check the quality of a previously generated output against a specific criterion, use the `/criteria/judge` endpoint. You can request judgements using either the TypeScript or Python SDKs, or through a cURL request. ```python from openpipe.client import OpenPipe op_client = OpenPipe() result = op_client.get_criterion_judgement( criterion_id="criterion-1@v1", # if no version is specified, the latest version is used input={"messages": messages}, output=output, ) ``` ```typescript import OpenPipe from "openpipe/client"; const opClient = OpenPipe(); const result = await opClient.getCriterionJudgement({ criterion_id: "criterion-1@v1", // if no version is specified, the latest version is used input: { messages, }, output: { role: "assistant", content: "1, 2, 3" }, }); ``` # Criteria Align LLM judgements with human ratings to evaluate and improve your models. For questions about criteria or to unlock beta features for your organization, reach out to [support@openpipe.ai](mailto:support@openpipe.ai). Criteria are a simple way to reliably detect and correct mistakes in LLM output. Criteria can currently be used for the following purposes: * Defining LLM evaluations * Improving dataset quality * Runtime evaluation when generating [best of N](/features/criteria/api#runtime-evaluation) samples * [Offline testing](/features/criteria/api#offline-testing) of previously generated outputs ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/overview.png) ## What is a Criterion? A criterion is a combination of an LLM model and prompt that can be used to identify a specific issue with a model's output. Criterion judgements are generated by passing the input and output of a single row along with the criterion prompt to an LLM model, which then returns a binary `PASS`/`FAIL` judgement. To learn how to create your first criterion, read the [Quick Start](/features/criteria/quick-start). # Criteria Quick Start Create and align your first criterion. Criteria are a reliable way to detect and correct mistakes in LLM output. Criteria can be used when defining LLM evaluations, improving data quality, and for [runtime evaluation](/features/criteria/api#runtime-evaluation) when generating **best of N** samples. This tutorial will walk you through creating and aligning your first criterion. Before you begin: Before creating your first criterion, you should identify an issue with your model's output that you want to detect and correct. You should also have either an OpenPipe [dataset](/features/datasets/overview) or a [JSONL file](/features/criteria/alignment-set#importing-from-a-jsonl-file) containing several rows of data that exhibit the issue, and several that don't. ### Creating a Criterion Navigate to the **Criteria** tab and click the **New Criterion** button. The creation modal will open with a default prompt and judge model. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/create-criterion.png) By default, each of the following fields will be templated into the criterion's prompt when assigning a judgement to an output: * `messages` *(optional):* The messages used to generate the output * `tools` *(optional):* The tools used to generate the output * `tool_choice` *(optional):* The tool choice used to generate the output * `output` *(required):* The chat completion object to be judged Many criteria do not require all of the input fields, and some may judge based soley on the `output`. You can exclude fields by removing them from the **Templated Variables** section. Write an initial LLM prompt with basic instructions for identifying rows containing the issue you want to detect and correct. Don't worry about engineering a perfect prompt, you'll have a chance to improve it during the alignment process. As an example, if you want to detect rows in which the model's output is in a different language than the input, you might write a prompt like this: ``` Mark the criteria as passed if the input and output are the same language. Mark it as failed if they are in different languages. ``` Make sure to use the terms `input`, `output`, `passed`, and `failed` in your prompt to match our internal templating. Finally, import a few rows (we recommend at least 30) into an alignment set for the criterion. Click **Create** to create the criterion and run the initial prompt against the imported alignment set. You'll be redirected to the criterion's alignment page. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/overview.png) ### Aligning a Criterion Ensuring your criterion's judgements are reliable involves two simple processes: * Manually labeling outputs * Refining the criterion In order to know whether you agree with your criterion's judgements, you'll need to label some data yourself. Use the Alignment UI to manually label each output with `PASS` or `FAIL` based on the criterion. Feel free to `SKIP` outputs you aren't sure about and come back to them later. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/manually-label.png) Try to label at least 30 rows to provide a reliable estimate of the LLM's precision and recall. As you record your own judgements, alter the criterion's prompt and judge model to align its judgements with your own. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/edit-criterion.png) Investing time in a good prompt and selecting the best judge model pays dividends. High-quality LLM judgements help you quickly identify rows that fail the criterion, speeding up the process of manually labeling rows. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/llm-judgement.png) As you improve your criterion prompt, you'll notice your [alignment stats](/features/criteria/alignment-set#alignment-stats) improving. Once you've labeled at least 30 rows and are satisfied with the precision and recall of your LLM judge, the criterion is ready to be deployed! ### Deploying a Criterion The simplest way to deploy a criterion is to create a criterion eval. Unlike head to head evals, criterion evals are not pairwise comparisons. Instead, they evaluate the quality of one or more models' output according to a specific criterion. First, navigate to the Evals tab and click **New Evaluation** -> **Add criterion eval**. Pick the models to evaluate and the test dataset on which to evaluate them. Next, select the criterion you would like to judge your models against. The judge model and prompt you defined when creating the criterion will be used to judge individual outputs from your models. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/create-criterion-eval.png) Finally, click **Create** to run the evaluation. Just like that, you're be able to view evaluation results based on aligned LLM judgements! ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/criteria/criterion-eval-results.png) # Exporting Data Export your past requests as a JSONL file in their raw form. ## Dataset export After you've collected, filtered, and transformed your dataset entries for fine-tuning, you can export them as a JSONL file. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/datasets/exporting-dataset-entries.png) ### Fields * **`messages`:** The complete chat history. * **`tools`:** The tools provided to the model. * **`tool_choice`:** The tool required for the model to use. * **`split`:** The train/test split to which the entry belongs. ### Example ```jsonl {"messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is the capital of Tasmania?"},{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"identify_capital","arguments":"{\"capital\":\"Hobart\"}"}}]}],"tools":[{"type":"function","function":{"name":"identify_capital","parameters":{"type":"object","properties":{"capital":{"type":"string"}}}}}]} {"messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is the capital of Sweden?"},{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"identify_capital","arguments":"{\"capital\":\"Stockholm\"}"}}]}],"tools":[{"type":"function","function":{"name":"identify_capital","parameters":{"type":"object","properties":{"capital":{"type":"string"}}}}}]} ``` # Importing Request Logs Search and filter your past LLM requests to inspect your responses and build a training dataset. Logged requests will be visible on your project's [Request Logs](https://app.openpipe.ai/p/BRZFEx50Pf/request-logs?filterData=%7B%22shown%22%3Atrue%2C%22filters%22%3A%5B%7B%22id%22%3A%221706912835890%22%2C%22field%22%3A%22request%22%2C%22comparator%22%3A%22CONTAINS%22%2C%22value%22%3A%22You+are+an+expert%22%7D%2C%7B%22id%22%3A%221706912850914%22%2C%22field%22%3A%22response%22%2C%22comparator%22%3A%22NOT_CONTAINS%22%2C%22value%22%3A%22As+an+AI+language+model%22%7D%2C%7B%22id%22%3A%221706912861496%22%2C%22field%22%3A%22model%22%2C%22comparator%22%3A%22%3D%22%2C%22value%22%3A%22gpt-4-0613%22%7D%2C%7B%22id%22%3A%221706912870230%22%2C%22field%22%3A%22tags.prompt_id%22%2C%22comparator%22%3A%22CONTAINS%22%2C%22value%22%3A%22redaction%22%7D%5D%7D) page. You can filter your logs by completionId, model, custom tags, and more to narrow down your results. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/log-filters.png) Once you've found a set of data that you'd like to train on, import those logs into the dataset of your choice. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/importing-logs.png) After your data has been saved to your dataset, [kicking off a training job](/features/fine-tuning) is straightforward. # Datasets Collect, evaluate, and refine your training data. Datasets are the raw material for training models. They can be scraped from your request logs or uploaded from your local machine. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/datasets/overview.png) To learn how to create a dataset, check out the [Quick Start](/features/datasets/quick-start) guide. # Datasets Quick Start Create your first dataset and import training data. Datasets are the raw material for training models. They're where you'll go to collect, evaluate, and refine your training data. To create a dataset, navigate to the **Datasets** tab and click **New Dataset**. Your dataset will be given a default name including the time at which it was created. We suggest editing the name to something more descriptive. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/datasets/editing-dataset-name.png) Now that you have a shiny new dataset, you need to somehow import data into it. This can be done in one of two ways: 1. [Importing request logs](/features/datasets/importing-logs) 2. [Uploading a file from your machine](/features/datasets/uploading-data) Click the links to learn more about each method. # Relabeling Data Use powerful models to generate new outputs for your data before training. After importing rows from request logs or uploading a JSONL file, you can optionally relabel each row by sending its messages, tools, and other input parameters to a more powerful model, which will generate an output to replace your row's existing output. If time or cost constraints prevent you from using the most powerful model available in production, relabeling offers an opportunity to optimize the quality of your training data before kicking off a job. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/relabeled-output.png) We currently include the following relabeling options: * gpt-4-turbo-2024-04-09 * gpt-4o-2024-08-06 * gpt-4-0125-preview * gpt-4-1106-preview * gpt-4-0613 * moa-gpt-4o-v1 (Mixture of Agents) * moa-gpt-4-turbo-v1 (Mixture of Agents) * moa-gpt-4-v1 (Mixture of Agents) Learn more about Mixture of Agents, a powerful technique for optimizing quality at the cost of speed and price, on the [Mixture of Agents](/features/mixture-of-agents) page. # Uploading Data Upload external data to kickstart your fine-tuning process. Use the OpenAI chat fine-tuning format. Upload a JSONL file populated with a list of training examples. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/uploading-data.png) Each line of the file should be compatible with the OpenAI [chat format](https://platform.openai.com/docs/api-reference/chat/object), with additional optional fields. ### OpenAI Fields * **`messages`: Required** - Formatted as a list of OpenAI [chat completion messages](https://platform.openai.com/docs/guides/gpt/chat-completions-api). The list should end with an assistant message. * **`tools`: Optional** - An array of tools (functions) available for the model to call. For more information read OpenAI's [function calling docs](https://platform.openai.com/docs/guides/function-calling). * **`tool_choice`: Optional** - You can set this to indicate that the model should be required to call the given tool. For more information read OpenAI's [function calling docs](https://platform.openai.com/docs/guides/function-calling). #### Deprecated * **`functions`: Deprecated | Optional** - An array of functions available for the model to call. * **`function_call`: Deprecated | Optional** - You can set this to indicate that the model should be required to call the given function. You can include other parameters from the OpenAI chat completion input format (eg. temperature), but they will be ignored since they aren't relevant for training. ### Additional Fields * **`split`: Optional** - One of "TRAIN" or "TEST". If you don't set this field we'll automatically divide your inputs into train and test splits with a target ratio of 90:10. * **`rejected_message`: Optional** - Add a rejected output for entries on which you want to perform direct preference optimization (DPO). You can find more information about that here: [Direct Preference Optimization](/features/dpo/Overview) ### Example ```jsonl ... {"messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is the capital of Tasmania?"},{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"identify_capital","arguments":"{\"capital\":\"Hobart\"}"}}]}],"tools":[{"type":"function","function":{"name":"identify_capital","parameters":{"type":"object","properties":{"capital":{"type":"string"}}}}}]} {"messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is the capital of Sweden?"},{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"identify_capital","arguments":"{\"capital\":\"Stockholm\"}"}}]}],"tools":[{"type":"function","function":{"name":"identify_capital","parameters":{"type":"object","properties":{"capital":{"type":"string"}}}}}]} ... ``` # Direct Preference Optimization (DPO) DPO is much harder to get right than supervised fine-tuning, and the results may not always be better. To get the most out of DPO, we recommend familiarizing yourself with your specific use case, your dataset, and the technique itself. Direct Preference Optimization (DPO), introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://arxiv.org/abs/2106.13358), is an algorithm used to fine-tune LLMs based on preference feedback. It focuses on aligning model outputs with specific human preferences or desired behaviors. Unlike traditional supervised fine-tuning, which relies solely on input-output pairs, DPO leverages preference data—information about which of two outputs is preferred in a given context. DPO works by directly optimizing a model to produce preferred outputs over non-preferred ones, without the need for complex reward modeling or reinforcement learning techniques. It uses paired data samples, where each pair consists of a preferred and a non-preferred response to a given prompt. This method allows the model to learn nuanced distinctions that are difficult to capture with explicit labels alone. By directly optimizing for preferences, DPO enables the creation of models that produce more aligned, contextually appropriate, and user-satisfying responses. ## Gathering Preference Data DPO is useful when you have a source of preference data that you can exploit. There are many possible sources of preference data, depending on your use case: 1. **Expert Feedback**: you may have a team of experts who can evaluate your model's outputs and edit them to make them better. You can use the original and edited outputs as rejected and preferred outputs respectively. DPO can be effective with just a few preference pairs. 2. **Criteria Feedback**: if you use [OpenPipe criteria](/features/criteria/overview) or another evaluation framework that assigns a score or pass/fail to an output based on how well it meets certain criteria, you can run several generations and use the highest and lowest scoring outputs as preferred and non-preferred outputs respectively. 3. **User Choice**: if you have a chatbot-style interface where users can select their preferred response from a list of generated outputs, you can use the selected and rejected outputs as preference data. 4. **User Regenerations**: if a user is able to regenerate an action multiple times and then eventually accepts one of the outputs, you can use the first output they rejected as a non-preferred output and the accepted output as a preferred output. 5. **User Edits**: if your model creates a draft output and the user is able to edit it and then save, you can use the original draft as a non-preferred output and the edited draft as a preferred output. ## Example Use Cases Initial tests with DPO on OpenPipe have shown promising results. DPO, when used with [user-defined criteria](https://docs.openpipe.ai/features/criteria/overview), allows you to fine-tune models that more consistently respect even very nuanced preferences. ![SFT vs DPO](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/dpo/sft-vs-dpo-for-criteria-chart.png) The following are all real results on customer tasks: * **Word Limit**: for a summarization task with an explicit word limit given in the prompt, DPO was able to cut the number of responses exceeding the limit from 31% to 7%, a **77%** decrease. * **Highlight Format**: for a content formatting task, DPO was able to drop the percentage of times the wrong word or phrase was highlighted from 17.3% to 1.7%, a **90%** decrease. * **Hallucination**: for an information extraction task, DPO was able to drop the fraction of outputs with hallucinated information from 12.7% to 3.0%, a **76%** decrease. * **Result Relevance**: for a classification task determining whether a result was relevant to a query, DPO was able to drop the mis-classification rate from 4.7% to 1.3%, a **72%** decrease. We're excited to see how you'll leverage DPO to create even more powerful and tailored models for your specific needs! # DPO Quick Start Train your first DPO fine-tuned model with OpenPipe. DPO fine-tuning uses preference data to train models on positive and negative examples. In OpenPipe, DPO can be used as a drop-in replacement for SFT fine-tuning or as a complement to it. Before you begin: Before building training your first model with DPO, make sure you've [created a dataset](/features/datasets/quick-start) and have collected at least 500 rows of training data on OpenPipe or another platform. To train a model with DPO, you need pairs of outputs containing preferred and rejected responses. You can prepare this data in one of two ways: 1. **Upload a JSONL file** Add training rows to your dataset by [uploading a JSONL file](/features/datasets/uploading-data). Make sure to add a `rejected_message` field on each row that you'd like to use for preference tuning. 2. **Track Rejected Outputs** In the **Data Pipeline** view of your dataset, you can convert original outputs that have been overwritten by either an LLM (through an LLM Relabel node) or human (through a Human Relabel node) into rejected outputs. The original output will be treated as the negative example, and the replacement output will be treated as the positive example. LLM Relabel Node ![LLM Relabel Node](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/dpo/llm-relabel-track-rejected-op.png)
Human Relabel Node ![Human Relabel Node](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/dpo/human-relabel-track-rejected-op.png)
Once your dataset is ready, training a DPO model is similar to training an SFT model. 1. Select the dataset you prepared for preference tuning. 2. Adjust the base model. * Currently, DPO is only supported on Llama 3.1 8B. 3. Under Advanced Options, click the Enable Preference Tuning checkbox. ![Enable Preference Tuning](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/dpo/enable-pt.png) You should now see the number of rows that will be used for supervised fine tuning (SFT Row Count) and preference tuning (Preference Row Count). Rows in your dataset that only include a preferred output will be used for supervised fine tuning, while rows with both preferred and rejected outputs will be used for preference tuning. Adjust the training job's hyperparameters if needed. We recommend using the default values if you're unsure. ![DPO Hyperparameters](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/dpo/dpo-hyperparams.png) Finally, kick off a training job by clicking the **Start Training** button.
# Evaluations Evaluate your fine-tuned models against comparison LLMs like GPT-4 and GPT-4-Turbo. Add and remove models from the evaluation, and customize the evaluation criteria. Once your model is trained, the next thing you want to know is how well it performs. OpenPipe's built-in evaluation framework makes it easy to compare new models you train against previous models and generic OpenAI models as well. When you train a model 10% of the dataset entries you provide will be withheld from training. These entries form your test set. For each entry in the test set, your new model will produce an output that will be shown in the [evaluation table](https://app.openpipe.ai/p/BRZFEx50Pf/datasets/0aa75f72-3fe5-4294-a94e-94c9236befa6/evaluate).
![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/evals-table.png)
While this table makes it really easy to compare model output for a given input side by side, it doesn't actually let you know which model is doing better in general. For that, we need custom evaluations. Evaluations allow you to compare model outputs across a variety of inputs to determine which model is doing a better job. On the backend, we use GPT-4 as a judge to determine which output is a better fit for the test dataset entry. You can configure the exact judgement criteria, which models will be judged, and how many dataset entries will be included in the evaluation from the evaluation's [Settings](https://app.openpipe.ai/p/BRZFEx50Pf/evals/f424c301-a45e-460e-920c-e87a5f121049/settings) page.
![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/eval-settings.png)
Results are shown in both a table and a head-to-head comparison view on the [Results](https://app.openpipe.ai/p/BRZFEx50Pf/evals/f424c301-a45e-460e-920c-e87a5f121049/results) page.
![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/eval-results.png)
To see the whole thing in action, check out the [Evaluate](https://app.openpipe.ai/p/BRZFEx50Pf/datasets/0aa75f72-3fe5-4294-a94e-94c9236befa6/evaluate) tab in our public Bullet Point Generator dataset. Feel free to play around with the display settings to get a feel for how individual models compare against one another! ## Evaluation models We provide OpenAI LLMs like GPT-4 and GPT-4-Turbo for evaluations by default. These models serve as a solid benchmark for comparing the performance of your fine-tuned models. In addition to the OpenAI models, you can add any hosted model with an OpenAI-compatible API to compare outputs with your fine-tuned models. To add an external model for evaluation, navigate to the [Project settings](https://app.openpipe.ai/p/BRZFEx50Pf/settings) page, where you'll find the option to include additional models in your evaluations. # Evaluations Quick Start Create your first head to head evaluation. Head to head evaluations allow you to compare two or more models against each other using LLM-as-judge based on custom instructions. Before you begin: Before writing your first eval, make sure you've [created a dataset](/features/datasets/quick-start) with one or more test entries. Also, make sure to add your OpenAI or Anthropic API key in your project settings page to allow the judge LLM to run. ### Writing an Evaluation To create an eval, navigate to the dataset with the test entries you'd like to evaluate your models based on. Find the **Evaluate** tab and click the **+** button to the right of the **Evals** dropdown list. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/eval-button.png) A configuration modal will appear. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/create-h2h-eval.png) Customize the judge LLM instructions. The outputs of each model will be compared against one another pairwise and a score of WIN, LOSS, or TIE will be assigned to each model's based on the judge's instructions. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/edit-judge-instructions.png) Choose a judge model from the dropdown list. If you'd like to use a judge model that isn't supported by default, add it as an [external model](/features/chat-completions/external-models) in your project settings page. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/select-judge-model.png) Choose the models you'd like to evaluate against one another. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/choose-evaluated-models.png) Click **Create** to start running the eval. Once the eval is complete, you can see model performance in the evaluation's **Results** tab. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/evaluations/quick-start-results.png) To learn more about customizing the judge LLM instructions and viewing evaluation judgements in greater detail, see the [Evaluations Overview](/features/evaluations/overview) page. # Fallback options Safeguard your application against potential failures, timeouts, or instabilities that may occur when using experimental or newly released models. Fallback is a feature that ensures a seamless experience and guarantees 100% uptime when working with new or unstable models. When fallback is enabled, any failed API calls will be automatically retried using OpenAI or any OpenAI-compatible client. ## Fallback to OpenAI To enable fallback to OpenAI, you can simply pass the `fallback` option to the `openpipe` object with the `model` property set to the OpenAI model you want to fall back to. ```python from openpipe import OpenAI client = OpenAI() completion = client.chat.completions.create( model="openpipe:my-ft-model", messages=[{"role": "system", "content": "count to 10"}], openpipe={ "fallback": { "model": "gpt-4-turbo" } }, ) ``` ```typescript import OpenAI from "openpipe/openai"; const openai = new OpenAI(); const completion = await openai.chat.completions.create({ messages: [{ role: "user", content: "Count to 10" }], model: "openpipe:my-ft-model", openpipe: { fallback: { model: "gpt-4-turbo" }, }, }); ``` ## Timeout Fallback If a request takes too long to execute, you can set a timeout for the fallback. In the example below, the request will fall back to OpenAI after 10 seconds. ```python from openpipe import OpenAI client = OpenAI(timeout=10) # initial OpenPipe call timeout in seconds completion = client.chat.completions.create( model="openpipe:my-ft-model", messages=[{"role": "system", "content": "count to 10"}], openpipe={ "fallback": { "model": "gpt-4-turbo", # optional fallback timeout. Defaults to the timeout specified in the client, or OpenAI default timeout if not set. "timeout": 20 # seconds } }, ) ``` ```typescript import OpenAI from "openpipe/openai"; const openai = new OpenAI(); const completion = await openai.chat.completions.create( { messages: [{ role: "user", content: "Count to 10" }], model: "openpipe:my-ft-model", openpipe: { fallback: { model: "gpt-4-turbo", // optional fallback timeout. Defaults to the timeout specified in client options, or OpenAI default timeout if not set. timeout: 20 * 1000, // milliseconds }, }, }, { timeout: 10 * 1000, // initial OpenPipe call timeout in milliseconds }, ); ``` ## Fallback to Custom OpenAI Compatible Client If you want to use another OpenAI-compatible fallback client, you can pass a `fallback_client` to the `openpipe` object. ```python from openpipe import OpenAI client = OpenAI( openpipe={ "fallback_client": OpenAICompatibleClient(api_key="client api key") } ); completion = client.chat.completions.create( model="openpipe:my-ft-model", messages=[{"role": "system", "content": "count to 10"}], openpipe={ "fallback": { "model": "gpt-4-turbo" } }, ) ``` ```typescript import OpenAI from "openpipe/openai"; const openai = new OpenAI({ openpipe: { fallbackClient: new OpenAICompatibleClient({ apiKey: "client api key" }), }, }); const completion = await openai.chat.completions.create({ messages: [{ role: "user", content: "Count to 10" }], model: "openpipe:my-ft-model", openpipe: { fallback: { model: "gpt-4-turbo" }, }, }); ``` # Fine Tuning via API (Beta) Fine tune your models programmatically through our API. We've made fine-tuning via API available through unstable routes that are subject to change. For most users, we highly recommend fine-tuning through the Webapp to achieve optimal performance with a smooth experience. However, some users may prefer to fine-tune via API for custom use cases. The following base models are supported for general access: * `OpenPipe/Hermes-2-Theta-Llama-3-8B-32k` * `meta-llama/Meta-Llama-3-8B-Instruct` * `meta-llama/Meta-Llama-3-70B-Instruct` * `OpenPipe/mistral-ft-optimized-1227` * `mistralai/Mixtral-8x7B-Instruct-v0.1` Learn more about fine-tuning via API on the [route page](/api-reference/post-unstablefinetunecreate). Please contact us at [hello@openpipe.ai](mailto:hello@openpipe.ai) if you would like help getting set up. # Fine-Tuning Quick Start Train your first fine-tuned model with OpenPipe. Fine-tuning open and closed models with custom hyperparameters only takes a few clicks. Before you begin: Before training your first model, make sure you've [created a dataset](/features/datasets/quick-start) and imported at least 10 training entries. ### Training a Model To train a model, navigate to the dataset you'd like to train your model on. Click the **Fine Tune** button in the top right corner of the **General** tab. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/fine-tuning/fine-tune-modal.png) Choose a descriptive name for your new model. This name will be used as the `model` parameter when querying it in code. You can always rename your model later. Select the base model you'd like to fine-tune on. We recommend starting with Llama 3.1 8B if you aren't sure which to choose. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/fine-tuning/select-base-model.png) Under **Advanced Options**, you can optionally adjust the hyperparameters to fine-tune your model. You can leave these at their default values if you aren't sure which to choose. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/fine-tuning/adjust-hyperparameters.png) Click **Start Training** to begin the training process. The training job may take a few minutes or a few hours to complete, depending on the amount of training data, the base model, and the hyperparameters you choose. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/fine-tuning/trained-model.png) To learn more about fine-tuning through the webapp, check out the [Fine-Tuning via Webapp](/features/fine-tuning/overview) page. To learn about fine-tuning via API, see our [Fine Tuning via API](/api-reference/fine-tuning) page. # Fine Tuning via Webapp Fine tune your models on filtered logs or uploaded datasets. Filter by prompt id and exclude requests with an undesirable output. OpenPipe allows you to train, evaluate, and deploy your models all in the same place. We recommend training your models through the webapp, which provides more flexibility and a smoother experience than the API. To fine-tune a new model, follow these steps: 1. Create a new dataset or navigate to an existing one. 2. Click "Fine Tune" in the top right. 3. Select a base model. 4. (Optional) Set custom hyperparameters and configure [pruning rules](/features/pruning-rules). 5. Click "Start Training" to kick off the job. Once started, your model's training job will take at least a few minutes and potentially several hours, depending on the size of the model and the amount of data. You can check your model's status by navigating to the Fine Tunes page and selecting your model. For an example of how an OpenPipe model looks once it's trained, see our public [PII Redaction](https://app.openpipe.ai/p/BRZFEx50Pf/fine-tunes/6076ad69-cce5-4892-ae54-e0549bbe107f/general) model. Feel free to hit it with some sample queries! ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/fine-tuning.png) # Mixture of Agents Use Mixture of Agents to increase quality beyond SOTA models. We’re currently beta-testing a novel completion generating technique we’re calling “Mixture of Agents,” which we’ll document more formally soon. The basic idea is that instead of simply asking GPT-4 to generate a completion for your prompt directly, we use a series of GPT-4 prompts to iteratively improve the completion. The steps our “mixture of agents” model takes are as follows: * **Prompt 1** generates 3 candidate completions in parallel by calling the chosen base model with `n=3` and a high temperature to promote output diversity. * **Prompt 2** again calls the base model. It passes in the original input again, along with the 3 candidate completions generated by prompt 1. It then asks the LLM to review the candidate completions and critique them. * **Prompt 3** again passes the original input, the 3 candidate completions, and their critiques. Using this information, the base model generates a final completion that incorporates the best of all 3 candidates. We’ve iterated on this process significantly and found that completions generated in this way tend to be significantly higher quality than those generated by GPT-4 in a single step, and lead to much stronger downstream fine-tuned models as well. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/moa/llm-judge-moa-wr.png) ## Using MoA in Production To use MoA models at inference time, make requests to the /chat/completions endpoint with a MoA model. See [instructions](/features/chat-completions/moa). ## Using the MoA Relabeling Flow The following instructions explain how to copy an existing dataset and relabel it with the mixture-of-agents flow, which will let you train models on the higher-quality outputs. 1. **Export the original dataset** Navigate to your existing OpenPipe dataset and click the “Export” button in the upper right. Keep the “Include split” checkbox checked. You’ll download a .jsonl file with the contents of your dataset (this may take a few minutes). ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/moa/export-arrow.png) 2. **Re-import the dataset** Create a new dataset in your project. Import the file you exported from step (1). Once the import finishes, your new dataset should contain a copy of the same data as the old one. 3. **Open the Data Pipeline view** Navigate to the **Data Pipeline** tab in the new dataset, then expand the Data Pipeline view by hovering over and clicking the data pipeline preview. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/moa/data-lineage-preview.png) 4. Select the “LLM Relabel” node for the file you just uploaded. Then in the sidebar, choose one of `moa-gpt-4-v1`, `moa-gpt-4-turbo-v1`, or `moa-gpt-4o-v1`, depending on which model you’d like to use as your MoA base. **Note:** we use your API key for relabelling, so you’ll need to have entered a valid OpenAI API key in your project settings for this to work. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/moa/data-lineage-relabeling.png) 5. **Wait for relabeling to finish** Depending on your dataset size relabelling may take quite a while. Behind the scenes we run 4 relabelling jobs in parallel at a time. You’ll know relabeling has finished when the “Processing entries” status disappears at the top right of the dataset view. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/moa/processing-entries.png) 6. **Train a model on the new dataset** Train the base model of your choice on the new dataset. 7. **(Optional) Evaluate your new model against your old one** If you have an existing head-to-head evaluation on the platform, you can easily add your new model to it to see how it compares. Simply open your existing eval and add your newly-trained model as another model to compare! ## Costs We aren’t charging for the MoA relabeling flow while it is in beta. However, you will pay for the actual calls to the OpenAI API. The exact cost varies depending on your input vs output mix but as a rule of thumb our MoA approach uses 3x-4x as many tokens as running the same completion in a non-MoA context. # Pruning Rules Decrease input token counts by pruning out chunks of static text. Some prompts have large chunks of unchanging text, like system messages that don't differ from one request to the next. By removing this static text and fine-tuning a model on the compacted data, we can reduce the size of incoming requests and save you money on inference. You can add pruning rules to your dataset in the Settings tab, as shown below and in our [demo dataset](https://app.openpipe.ai/p/BRZFEx50Pf/datasets/109bbb87-399e-4d13-ad74-55ae5d5d43eb/settings). ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/pruning-rules.png) You can also see what your input looks like with the pruning rules applied in the Dataset Entry drawer (see [demo model](https://app.openpipe.ai/p/BRZFEx50Pf/fine-tunes/6076ad69-cce5-4892-ae54-e0549bbe107f/general)): ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/applied-pruning-rule.png) A fine-tuned model automatically inherits all pruning rules applied to the dataset on which it is trained. These rules will automatically prune static text out of any incoming requests sent to that model. Pruning rules that are added after a fine-tuned model was trained will not be associated with that model, so you don't need to worry about backwards compatibility. ## Warning: can affect quality! We’ve found that while pruning rules always decrease latency and costs, they can also negatively affect response quality, especially with smaller datasets. We recommend enabling pruning rules on datasets with 10K+ training examples, as smaller datasets may not provide enough guidance for the model to fully learn the task. # Exporting Logs Export your past requests as a JSONL file in their raw form. ## Request logs export Once your request logs are recorded, you can export them at any time. The exported jsonl contains all the data that we've collected from your logged calls, including tags and errors. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/features/request-logs/exporting-logs.png) ### Fields * **`Input`:** The complete chat creation request. * **`Output`:** Whatever output was generated, including errors. * **`Tags`:** Any metadata tags that you included when making the request. ### Example ```jsonl {"input":{"model":"openpipe:test-tool-calls-ft","tools":[{"type":"function","function":{"name":"get_current_weather","parameters":{"type":"object","required":["location"],"properties":{"unit":{"enum":["celsius","fahrenheit"],"type":"string"},"location":{"type":"string","description":"The city and state, e.g. San Francisco, CA"}}},"description":"Get the current weather in a given location"}}],"messages":[{"role":"system","content":"tell me the weather in SF and Orlando"}]},"output":{"id":"c7670af0d71648b0bd829fa1901ac6c5","model":"openpipe:test-tool-calls-ft","usage":{"total_tokens":106,"prompt_tokens":47,"completion_tokens":59},"object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":null,"tool_calls":[{"id":"","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\": \"San Francisco, CA\", \"unit\": \"celsius\"}"}},{"id":"","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\": \"Orlando, FL\", \"unit\": \"celsius\"}"}}]},"finish_reason":"stop"}],"created":1702666185703},"tags":{"prompt_id":"test_sync_tool_calls_ft","$sdk":"python","$sdk.version":"4.1.0"}} {"input":{"model":"openpipe:test-content-ft","messages":[{"role":"system","content":"count to 3"}]},"output":{"id":"47116eaa9dad4238bf12e32135f9c147","model":"openpipe:test-content-ft","usage":{"total_tokens":38,"prompt_tokens":29,"completion_tokens":9},"object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"1, 2, 3"},"finish_reason":"stop"}],"created":1702666036923},"tags":{"prompt_id":"test_sync_content_ft","$sdk":"python","$sdk.version":"4.1.0"}} ``` If you'd like to see how it works, try exporting some logs from our [public demo](https://app.openpipe.ai/p/BRZFEx50Pf/request-logs). # Logging Requests Record production data to train and improve your models' performance. Request logs are a great way to get to know your data. More importantly, you can import recorded logs directly into your training datasets. That means it's really easy to train on data you've collected in production. We recommend collecting request logs for both base and fine-tuned models. We provide several options for recording your requests. ### SDK The simplest way to start ingesting request logs into OpenPipe is by installing our Python or TypeScript SDK. Requests to both OpenAI and OpenPipe models will automatically be recorded. Logging doesn't add any latency to your requests, because our SDK calls the OpenAI server directly and returns your completion before kicking off the request to record it in your project. We provide a drop-in replacement for the OpenAI SDK, so the only code you need to update is your import statement: ```python # from openai import OpenAI from openpipe import OpenAI # Nothing else changes client = OpenAI() completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "system", "content": "count to 10"}], # searchable metadata tags are highly recommended metadata={"prompt_id": "counting", "any_key": "any_value"}, ) ``` ```typescript // import OpenAI from "openai" import OpenAI from "openpipe/openai"; // Nothing else changes const client = new OpenAI(); const completion = await client.chat.completions.create({ model: "gpt-3.5-turbo", messages: [{ role: "user", content: "Count to 10" }], // searchable metadata tags are highly recommended metadata: { prompt_id: "counting", any_key: "any_value", }, }); ``` See [Installing the SDK](/getting-started/openpipe-sdk) for a quick guide on how to get started. ### Proxy If you're developing in a language other than Python or TypeScript, the best way to ingest data into OpenPipe is through our proxy. We provide a `/chat/completions` endpoint that is fully compatible with OpenAI, so you can continue using the latest features like tool calls and streaming without a hitch. Integrating the Proxy and logging requests requires a couple steps. 1. Add an OpenAI key to your project in the [project settings](https://app.openpipe.ai/settings) page. 2. Set the authorization token of your request to be your OpenPipe API key. 3. Set the destination url of your request to be `https://api.openpipe.ai/api/v1/chat/completions`. 4. When making any request that you’d like to record, include the `"store": true` parameter in the request body. We also recommend that you add custom metadata tags to your request to distinguish data collected from different prompts. Here's an example of steps 2-4 put together in both a raw cURL request and with the Python SDK: ```bash curl --request POST \ --url https://api.openpipe.ai/api/v1/chat/completions \ --header "Authorization: Bearer YOUR_OPENPIPE_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "model": "gpt-4-0613", "messages": [ { "role": "system", "content": "count to 5" } ], "max_tokens": 100, "temperature": 0, "store": true, "metadata": { "prompt_id": "first_prompt" } }' ``` ```python from openai import OpenAI # Find your API key in https://app.openpipe.ai/settings client = OpenAI( base_url="https://api.openpipe.ai/api/v1", api_key="YOUR_OPENPIPE_API_KEY" ) completion = client.chat.completions.create( model="gpt-4-0613", messages=[{"role": "system", "content": "count to 5"}], stream=True, store=True, metadata={"prompt_id": "first_prompt"}, ) ``` ```typescript import OpenAI from "openai"; // Find your API key in https://app.openpipe.ai/settings const client = new OpenAI({ baseURL: "https://api.openpipe.ai/api/v1", apiKey: "YOUR_OPENPIPE_API_KEY", }); const completion = await client.chat.completions.create({ model: "gpt-4-0613", messages: [{ role: "system", content: "count to 5" }], store: true, metadata: { prompt_id: "first_prompt" }, }); ``` ### Reporting If you need more flexibility in how you log requests, you can use the `report` endpoint. This gives you full control over when and how to create request logs. ```python import time from openai import OpenAI from openpipe.client import OpenPipe client = OpenAI() op_client = OpenPipe() payload = { "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Count to 10"}], } completion = client.chat.completions.create(**payload) op_client.report( requested_at=int(time.time() * 1000), received_at=int(time.time() * 1000), req_payload=payload, resp_payload=completion, status_code=200, metadata={"prompt_id": "My prompt id"}, ) ``` ```typescript import OpenAI from "openai"; import { ChatCompletionCreateParams } from "openai/resources"; import OpenPipe from "openpipe/client"; const client = new OpenAI(); const opClient = new OpenPipe(); const payload: ChatCompletionCreateParams = { model: "gpt-3.5-turbo", messages: [{ role: "user", content: "Count to 10" }], }; const completion = await client.chat.completions.create(payload); await opClient.report({ requestedAt: Date.now(), receivedAt: Date.now(), reqPayload: payload, respPayload: completion, statusCode: 200, metadata: { prompt_id: "My prompt id" }, }); ``` If you’re developing in a language other than Python or TypeScript, you can also make a raw HTTP request to the [report](/api-reference/post-report) endpoint. Once you've set up logging, you will see the data on the Request Logs page. From there, you'll be able to search through your requests and train your models. See [Training on Logs](/features/datasets/importing-logs) to learn more. # Logging Anthropic Requests Anthropic's language models have a different API structure than those of OpenAI. To record requests made to Anthropic's models, follow the examples below: ```python import time from anthropic import Anthropic from openpipe.client import OpenPipe anthropic = Anthropic() op_client = OpenPipe() payload = { "model": "claude-3-opus-20240229", "messages": [{"role": "user", "content": "Hello, Claude"}], "max_tokens": 100, } message = anthropic.messages.create(**payload) op_client.report_anthropic( requested_at=int(time.time() * 1000), received_at=int(time.time() * 1000), req_payload=payload, resp_payload=message, status_code=200, metadata={ "prompt_id": "My prompt id", }, ) ``` ```typescript import Anthropic from "@anthropic-ai/sdk"; import { Message, MessageCreateParams } from "@anthropic-ai/sdk/resources"; import OpenPipe from "openpipe/client"; const anthropic = new Anthropic(); const opClient = new OpenPipe(); const payload: MessageCreateParams = { model: "claude-3-opus-20240229", messages: [{ role: "user", content: "Hello, Claude" }], max_tokens: 1024, }; const message: Message = await anthropic.messages.create(payload); await opClient.reportAnthropic({ requestedAt: Date.now(), receivedAt: Date.now(), reqPayload: payload, respPayload: message, statusCode: 200, metadata: { prompt_id: "My prompt id", }, }); ``` If you're using a different programming language, you can make a raw http request to the [report-anthropic](/api-reference/post-report-anthropic) enpoint. # Updating Metadata Tags You may want to update the metadata tags on a request log after it's already been reported. For instance, if you notice that a certain completion from your fine-tuned model was flawed, you can mark it to be imported into one of your datasets and relabeled with GPT-4 for future training. ```python import os from openpipe import OpenPipe, OpenAI from openpipe.client import UpdateLogTagsRequestFiltersItem # Find the config values in "Installing the SDK" client = OpenAI() op_client = OpenPipe( # defaults to os.environ["OPENPIPE_API_KEY"] api_key="YOUR_API_KEY" ) completion = client.chat.completions.create( model="openpipe:your-fine-tuned-model-id", messages=[{"role": "system", "content": "count to 10"}], metadata={"prompt_id": "counting", "tag_to_remove": "some value"}, ) resp = op_client.update_log_metadata( filters=[ UpdateLogTagsRequestFiltersItem( field="completionId", equals=completion.id, ), # completionId is the only filter necessary in this case, but let's add a couple more examples UpdateLogTagsRequestFiltersItem( field="model", equals="openpipe:your-fine-tuned-model-id", ), UpdateLogTagsRequestFiltersItem( field="metadata.prompt_id", equals="counting", ), ], metadata={ "relabel": "true", "tag_to_remove": None # this will remove the tag_to_remove tag from the request log we just created }, ) assert resp.matched_logs == 1 ``` ```typescript import OpenAI from "openpipe/openai"; import OpenPipe from "openpipe/client"; // Find the config values in "Installing the SDK" const client = OpenAI(); const opClient = OpenPipe({ // defaults to process.env.OPENPIPE_API_KEY apiKey: "YOUR_API_KEY", }); const completion = await client.chat.completions.create({ model: "openpipe:your-fine-tuned-model-id", messages: [{ role: "user", content: "Count to 10" }], metadata: { prompt_id: "counting", tag_to_remove: "some value", }, }); const resp = await opClient.updateLogTags({ filters: [ { field: "completionId", equals: completion.id }, // completionId is the only filter necessary in this case, but let's add a couple more examples { field: "model", equals: "openpipe:your-fine-tuned-model-id" }, { field: "metadata.prompt_id", equals: "counting" }, ], metadata: { relabel: "true", tag_to_remove: null, // this will remove the tag_to_remove tag from the request log we just created }, }); expect(resp.matchedLogs).toEqual(1); ``` To update your metadata, you'll need to provide two fields: `filters` and `metadata`. ### Filters Use filters to determine which request logs should be updated. Each filter contains two fields, `field` and `equals`. * **`field`: Required** - Indicates the field on a request log that should be checked. Valid options include `model`, `completionId`, and `tags.your_tag_name`. * **`equals`: Required** - The value that the field should equal. Keep in mind that filters are cumulative, so only request logs that match all of the filters you provide will be updated. ### Metadata Provide one or more metadata tags in a json object. The key should be the name of the tag you'd like to add, update, or delete. The value should be the new value of the tag. If you'd like to delete a tag, provide a value of `None` or `null`. Updated metadata tags will be searchable in the [Request Logs](/features/request-logs) panel. # Installing the SDK Use the OpenPipe SDK as a drop-in replacement for the generic OpenAI package. Calls sent through the OpenPipe SDK will be recorded by default for later training. You'll use this same SDK to call your own fine-tuned models once they're deployed. Find the SDK at [https://pypi.org/project/openpipe/](https://pypi.org/project/openpipe/) ## Installation ```bash pip install openpipe ``` ## Simple Integration Add `OPENPIPE_API_KEY` to your environment variables. ```bash export OPENPIPE_API_KEY=opk- # Or you can set it in your code, see "Complete Example" below ``` Replace this line ```python from openai import OpenAI ``` with this one ```python from openpipe import OpenAI ``` ## Adding Searchable Metadata Tags OpenPipe follows OpenAI’s concept of metadata tagging for requests. You can use metadata tags in the [Request Logs](/features/request-logs) view to narrow down the data your model will train on. We recommend assigning a unique metadata tag to each of your prompts. These tags will help you find all the input/output pairs associated with a certain prompt and fine-tune a model to replace it. Here's how you can use the tagging feature: ## Complete Example ```python from openpipe import OpenAI import os client = OpenAI( # defaults to os.environ.get("OPENAI_API_KEY") api_key="My API Key", openpipe={ # defaults to os.environ.get("OPENPIPE_API_KEY") "api_key": "My OpenPipe API Key", # optional, defaults to process.env["OPENPIPE_BASE_URL"] or https://api.openpipe.ai/api/v1 if not set "base_url": "My URL", } ) completion = client.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "system", "content": "count to 10"}], metadata={"prompt_id": "counting", "any_key": "any_value"}, ) ``` Find the SDK at [https://www.npmjs.com/package/openpipe](https://www.npmjs.com/package/openpipe) ## Installation ```bash npm install --save openpipe # or yarn add openpipe ``` ## Simple Integration Add `OPENPIPE_API_KEY` to your environment variables. ```bash export OPENPIPE_API_KEY=opk- # Or you can set it in your code, see "Complete Example" below ``` Replace this line ```typescript import OpenAI from "openai"; ``` with this one ```typescript import OpenAI from "openpipe/openai"; ``` ## Adding Searchable Metadata Tags OpenPipe follows OpenAI’s concept of metadata tagging for requests. You can use metadata tags in the [Request Logs](/features/request-logs) view to narrow down the data your model will train on. We recommend assigning a unique metadata tag to each of your prompts. These tags will help you find all the input/output pairs associated with a certain prompt and fine-tune a model to replace it. Here's how you can use the tagging feature: ## Complete Example ```typescript import OpenAI from "openpipe/openai"; // Fully compatible with original OpenAI initialization const openai = new OpenAI({ apiKey: "my api key", // defaults to process.env["OPENAI_API_KEY"] // openpipe key is optional openpipe: { apiKey: "my api key", // defaults to process.env["OPENPIPE_API_KEY"] baseUrl: "my url", // defaults to process.env["OPENPIPE_BASE_URL"] or https://api.openpipe.ai/api/v1 if not set }, }); const completion = await openai.chat.completions.create({ messages: [{ role: "user", content: "Count to 10" }], model: "gpt-3.5-turbo", // optional metadata: { prompt_id: "counting", any_key: "any_value", }, store: true, // Enable/disable data collection. Defaults to true. }); ``` Find the SDK at [https://www.npmjs.com/package/openpipe](https://www.npmjs.com/package/openpipe) ## Installation ```bash npm install --save openpipe # or yarn add openpipe ``` ## Simple Integration Add `OPENPIPE_API_KEY` to your environment variables. ```bash export OPENPIPE_API_KEY=opk- # Or you can set it in your code, see "Complete Example" below ``` Replace this line ```typescript const OpenAI = require("openai"); ``` with this one ```typescript const OpenAI = require("openpipe/openai").default; ``` ## Adding Searchable Metadata Tags OpenPipe follows OpenAI’s concept of metadata tagging for requests. You can use metadata tags in the [Request Logs](/features/request-logs) view to narrow down the data your model will train on. We recommend assigning a unique metadata tag to each of your prompts. These tags will help you find all the input/output pairs associated with a certain prompt and fine-tune a model to replace it. Here's how you can use the tagging feature: ## Complete Example ```typescript import OpenAI from "openpipe/openai"; // Fully compatible with original OpenAI initialization const openai = new OpenAI({ apiKey: "my api key", // defaults to process.env["OPENAI_API_KEY"] // openpipe key is optional openpipe: { apiKey: "my api key", // defaults to process.env["OPENPIPE_API_KEY"] baseUrl: "my url", // defaults to process.env["OPENPIPE_BASE_URL"] or https://api.openpipe.ai/api/v1 if not set }, }); const completion = await openai.chat.completions.create({ messages: [{ role: "user", content: "Count to 10" }], model: "gpt-3.5-turbo", // optional metadata: { prompt_id: "counting", any_key: "any_value", }, store: true, // Enable/disable data collection. Defaults to true. }); ``` ## Should I Wait to Enable Logging? We recommend keeping request logging turned on from the beginning. If you change your prompt you can just set a new `prompt_id` metadata tag so you can select just the latest version when you're ready to create a dataset. # Quick Start Get started with OpenPipe in a few quick steps. ## Step 1: Create your OpenPipe Account If you don't already have one, create an account with OpenPipe at [https://app.openpipe.ai/](https://app.openpipe.ai/). You can sign up with GitHub, so you don't need to remember an extra password. ## Step 2: Find your Project API key In order to capture your calls and fine-tune a model on them, we need an API key to authenticate you and determine which project to store your logs under. When you created your account, a project was automatically configured for you as well. Find its API key at [https://app.openpipe.ai/settings](https://app.openpipe.ai/settings). ## Step 3: Record Training Data You're done with the hard part! Now let's start recording training data, either by integrating the OpenPipe SDK or using the OpenPipe Proxy. # OpenPipe Documentation Software engineers and data scientists use OpenPipe's intuitive fine-tuning and monitoring services to decrease the cost and latency of their LLM operations. You can use OpenPipe to collect and analyze LLM logs, create fine-tuned models, and compare output from multiple models given the same input. ![](https://mintlify.s3-us-west-1.amazonaws.com/openpipe/images/intro/dataset-general.png) Quickly integrate the OpenPipe SDK into your application and start collecting data. View the platform features OpenPipe provides and learn how to use them. Glance over the public demo we've set up to get an idea for how OpenPipe works. # Overview OpenPipe is a streamlined platform designed to help product-focused teams train specialized LLM models as replacements for slow and expensive prompts. ## What We Provide Here are a few of the features we offer: * [**Unified SDK**](/getting-started/openpipe-sdk): Collect and utilize interaction data to fine-tune a custom model and continually refine and enhance model performance. Switching requests from your previous LLM provider to your new model is as simple as changing the model name. All our models implement the OpenAI inference format, so you won't have to change how you parse its response. * [**Data Capture**](/features/request-logs): OpenPipe captures every request and response and stores it for your future use. * [**Request Logs**](/features/request-logs): We help you automatically log your past requests and tag them for easy filtering. * [**Upload Data**](/features/datasets/uploading-data): OpenPipe also allows you to import fine-tuning data from OpenAI-compatible JSONL files. * [**Export Data**](/features/datasets/exporting-data): Once your request logs are recorded, you can export them at any time. * [**Fine-Tuning**](/features/fine-tuning/overview): With all your LLM requests and responses in one place, it's easy to select the data you want to fine-tune on and kick off a job. * [**Pruning Rules**](/features/pruning-rules): By removing large chunks of unchanging text and fine-tuning a model on the compacted data, we can reduce the size of incoming requests and save you money on inference. * [**Model Hosting**](/features/chat-completions): After we've trained your model, OpenPipe will automatically begin hosting it. * [**Caching**](/features/caching): Improve performance and reduce costs by caching previously generated responses. * [**Evaluations**](/features/evaluations/overview): Compare your models against one another and OpenAI base models. Set up custom instructions and get quick insights into your models' performance. Welcome to the OpenPipe community! # Pricing Overview ## Training We charge for training based on the size of the model and the number of tokens in the dataset. | Model Category | Cost per 1M tokens | | ------------------ | ------------------ | | **8B and smaller** | \$3.00 | | **32B models** | \$8.00 | | **70B+ models** | \$16.00 | ## Hosted Inference Choose between two billing models for running models on our infrastructure: ### 1. Per-Token Pricing Available for our most popular, high-volume models. You only pay for the tokens you process, with no minimum commitment and automatic infrastructure scaling. | Model | Input (per 1M tokens) | Output (per 1M tokens) | | -------------------------- | --------------------- | ---------------------- | | **Llama 3.1 8B Instruct** | \$0.30 | \$0.45 | | **Llama 3.1 70B Instruct** | \$1.80 | \$2.00 | ### 2. Hourly Compute Units Designed for experimental and lower-volume models. A Compute Unit (CU) can handle up to 24 simultaneous requests per second. Billing is precise down to the second, with automatic scaling when traffic exceeds capacity. Compute units remain active for 60 seconds after traffic spikes. | Model | Rate per CU Hour | | ---------------------- | ---------------- | | **Llama 3.1 8B** | \$1.50 | | **Mistral Nemo 12B** | \$1.50 | | **Qwen 2.5 32B Coder** | \$6.00 | | **Qwen 2.5 72B** | \$12.00 | | **Llama 3.1 70B** | \$12.00 | ## Third-Party Models (OpenAI, Gemini, etc.) Third-party models fine-tuned through OpenPipe like OpenAI's GPT series or Google's Gemini, we provide direct API integration without any additional markup. You will be billed directly by the respective provider (OpenAI, Google, etc.) at their standard rates. We simply pass through the API calls and responses. ## Enterprise Plans For organizations requiring custom solutions, we offer enterprise plans that include: * Volume discounts * On-premises deployment options * Dedicated support * Custom SLAs * Advanced security features Contact our team at [hello@openpipe.ai](mailto:hello@openpipe.ai) to discuss enterprise pricing and requirements.