When caching is enabled, our service stores the responses generated for each unique request. If an identical request is made in the future, instead of processing the request again, the cached response is instantly returned. This eliminates the need for redundant computations, resulting in faster response times and reduced API usage costs.

Caching is currently in a free beta preview.

Enabling Caching

To enable caching for your requests, you can set the cache property of the openpipe object to true. If you are making requests through our proxy, add the op-cache header to your requests:

curl --request POST \
  --url https://app.openpipe.ai/api/v1/chat/completions \
  --header 'Authorization: Bearer YOUR_OPENPIPE_API_KEY' \
  --header 'Content-Type: application/json' \
  --header 'op-cache: true' \
  --data '{
  "model": "openpipe:your-fine-tuned-model-id",
  "messages": [
    {
      "role": "system",
      "content": "count to 5"
    }
  ]
}'