Hourly Rates

Choose the right model for your needs

ModelRate per CU Hour
Llama 3 8B$1.50
Llama 3 8B 32K$1.50
Mistral 7B Optimized$1.50
Mixtral 8x7B Instruct$6.00
Llama 3 70B$12.00

A Compute Unit (CU) can handle up to 24 simultaneous requests per second.

How does our time-based billing work?

We’ve designed our billing system to be transparent and cost-effective for you:

  • Precise Usage Billing: We charge based on your usage down to the second. This ensures you only pay for what you actually use.
  • Automatic Scaling: If your traffic exceeds the capacity of a single compute unit, we automatically scale up to meet demand. For example: 70 simultaneous requests would use 3 compute units/second: Ceil(70 ÷ 24) = 3
  • Cool-down Period: To minimize cold start problems, we keep compute units active for 60 seconds after a traffic spike. This ensures smooth performance during fluctuating demand.