API rate limits

Understand Scalekit API rate limits, tell them apart from upstream provider limits, handle 429 responses, and request a higher limit.

Scalekit applies a rate limit to the API requests from each environment. When a workload exceeds that limit, the API responds with HTTP 429 Too Many Requests. This page explains how to recognize a rate-limit response, tell a Scalekit limit apart from an upstream provider’s limit, handle the response, and request a higher limit.

How rate limits work

Scalekit enforces a per-environment request rate, measured in requests per minute. Scalekit tunes the limit per account, so high-throughput workloads can need a higher limit than the default. Routing MCP tool calls through Scalekit on top of authentication traffic is one example of a workload that can need more headroom.

When you exceed the limit, Scalekit returns HTTP 429 Too Many Requests. Back off and retry with exponential backoff rather than retrying immediately.

Tell Scalekit limits apart from provider limits

When you call tools through Scalekit, a 429 can come from either Scalekit or the upstream provider that Scalekit calls on your behalf, such as a CRM or data API. The error_code field on the error identifies the source:

`error_code`	Source	What to do
`RATE_LIMITED`	Scalekit’s own rate limit	Reduce the overall request frequency and back off before retrying.
`TOOL_ERROR`	The upstream provider rate-limited the tool call	Apply the provider’s recommended backoff. Check the tool call logs for the provider’s message.

Review the detailed error in your dashboard’s tool call logs to confirm which provider and which tool triggered the limit.

Handle a 429 response

Every Scalekit SDK raises a dedicated exception when the API returns 429. Catch it, read the error code to determine the source, and back off before retrying.

1
import { ScalekitTooManyRequestsException } from '@scalekit-sdk/node';
2

3
try {
4
  // Your Scalekit SDK call, for example executing a tool
5
  await scalekit.tools.executeTool(/* ... */);
6
} catch (error) {
7
  if (error instanceof ScalekitTooManyRequestsException) {
8
    // errorCode identifies the source of the 429 so you can back off correctly
9
    if (error.errorCode === 'TOOL_ERROR') {
10
      // Upstream provider rate-limited the call: apply provider-specific backoff
11
      console.error('Provider rate limit:', error.message);
12
    } else {
13
      // Scalekit's own rate limit (RATE_LIMITED): reduce overall request frequency
14
      console.error('Scalekit rate limit:', error.message);
15
    }
16
  }
17
}

1
from scalekit.common.exceptions import ScalekitTooManyRequestsException
2

3
try:
4
    result = scalekit_client.tools.execute_tool(...)
5
except ScalekitTooManyRequestsException as e:
6
    # error_code identifies the source of the 429 so you can back off correctly
7
    if e.error_code == "TOOL_ERROR":
8
        # Upstream provider rate-limited the call: apply provider-specific backoff
9
        print("Provider rate limit:", e.message)
10
    else:
11
        # Scalekit's own rate limit (RATE_LIMITED): reduce overall request frequency
12
        print("Scalekit rate limit:", e.message)

1
// Your Scalekit SDK call, for example executing a tool
2
_, err := scalekitClient.Tools.ExecuteTool(ctx /* ... */)
3
if err != nil {
4
    // Inspect the error code to find the source of the 429.
5
    // "TOOL_ERROR" is the upstream provider's limit; "RATE_LIMITED" is Scalekit's
6
    // own limit. Back off with exponential backoff before retrying.
7
    log.Printf("rate limited: %v", err)
8
}

1
try {
2
    // Your Scalekit SDK call, for example executing a tool
3
    scalekitClient.tools().executeTool(/* ... */);
4
} catch (ScalekitException error) {
5
    // Inspect the error code to find the source of the 429.
6
    // "TOOL_ERROR" is the upstream provider's limit; "RATE_LIMITED" is Scalekit's
7
    // own limit. Back off with exponential backoff before retrying.
8
    System.err.println("Rate limited: " + error.getMessage());
9
}

Request a higher limit

If your workload needs a higher limit, contact Scalekit support with your account details and your expected peak requests per minute. Plan ahead before you route additional traffic, such as MCP tool calls, through Scalekit. Scalekit reviews the request and adjusts the limit for your account.

When you estimate the limit you need, include headroom above your current peak so that normal spikes do not trigger 429 responses.