Pricing
Hourly Pricing
Our pricing for time-based billing
Hourly Rates
Choose the right model for your needs
Model | Rate per CU Hour |
---|---|
Llama 3 8B | $1.50 |
Llama 3 8B 32K | $1.50 |
Mistral 7B Optimized | $1.50 |
Mixtral 8x7B Instruct | $6.00 |
Llama 3 70B | $12.00 |
A Compute Unit (CU) can handle up to 24 simultaneous requests per second.
How does our time-based billing work?
We’ve designed our billing system to be transparent and cost-effective for you:
- Precise Usage Billing: We charge based on your usage down to the second. This ensures you only pay for what you actually use.
- Automatic Scaling: If your traffic exceeds the capacity of a single compute unit, we automatically scale up to meet demand. For example: 70 simultaneous requests would use 3 compute units/second: Ceil(70 ÷ 24) = 3
- Cool-down Period: To minimize cold start problems, we keep compute units active for 60 seconds after a traffic spike. This ensures smooth performance during fluctuating demand.