In some cases, completions produced by GPT-4 or other SOTA models aren’t good enough to be used in production. To improve quality beyond the limit of SOTA models, we’ve developed a Mixture of Agents (MoA) technique that enhances quality but also increases cost and latency.

To use MoA models, set the model parameter to be one of the following:

  • openpipe:moa-gpt-4o-v1
  • openpipe:moa-gpt-4-turbo-v1
  • openpipe:moa-gpt-4-v1

To get the highest quality completions, use the MoA model that corresponds to the best-performing SOTA model. For instance, if your original model was gpt-4-turbo-2024-04-09, try switching to openpipe:moa-gpt-4-turbo-v1.

Make sure to set your OpenAI API Key in the Project Settings page to enable MoA completions!

from openpipe import OpenAI

# Find the config values in "Installing the SDK"
client = OpenAI()

completion = client.chat.completions.create(
    # model="gpt-4-turbo-2024-04-09", - original model
    model="openpipe:moa-gpt-4-turbo-v1",
    messages=[{"role": "system", "content": "count to 10"}],
    metadata={"prompt_id": "counting", "any_key": "any_value"},
)

To learn more, visit the Mixture of Agents page.