Pricing Overview
Training
We charge for training based on the size of the model and the number of tokens in the dataset.
Model Category | Cost per 1M tokens |
---|---|
8B and smaller | $3.00 |
32B models | $8.00 |
70B+ models | $16.00 |
Hosted Inference
Choose between two billing models for running models on our infrastructure:
1. Per-Token Pricing
Available for our most popular, high-volume models. You only pay for the tokens you process, with no minimum commitment and automatic infrastructure scaling.
Model | Input (per 1M tokens) | Output (per 1M tokens) |
---|---|---|
Llama 3.1 8B Instruct | $0.30 | $0.45 |
Llama 3.1 70B Instruct | $1.80 | $2.00 |
2. Hourly Compute Units
Designed for experimental and lower-volume models. A Compute Unit (CU) can handle up to 24 simultaneous requests per second. Billing is precise down to the second, with automatic scaling when traffic exceeds capacity. Compute units remain active for 60 seconds after traffic spikes.
Model | Rate per CU Hour |
---|---|
Llama 3.1 8B | $1.50 |
Mistral Nemo 12B | $1.50 |
Qwen 2.5 32B Coder | $6.00 |
Qwen 2.5 72B | $12.00 |
Llama 3.1 70B | $12.00 |
Third-Party Models (OpenAI, Gemini, etc.)
Third-party models fine-tuned through OpenPipe like OpenAI’s GPT series or Google’s Gemini, we provide direct API integration without any additional markup. You will be billed directly by the respective provider (OpenAI, Google, etc.) at their standard rates. We simply pass through the API calls and responses.
Enterprise Plans
For organizations requiring custom solutions, we offer enterprise plans that include:
- Volume discounts
- On-premises deployment options
- Dedicated support
- Custom SLAs
- Advanced security features
Contact our team at hello@openpipe.ai to discuss enterprise pricing and requirements.