API rate limits
Understand Scalekit API rate limits, tell them apart from upstream provider limits, handle 429 responses, and request a higher limit.
Scalekit applies a rate limit to the API requests from each environment. When a workload exceeds that limit, the API responds with HTTP 429 Too Many Requests. This page explains how to recognize a rate-limit response, tell a Scalekit limit apart from an upstream provider’s limit, handle the response, and request a higher limit.
How rate limits work
Section titled “How rate limits work”Scalekit enforces a per-environment request rate, measured in requests per minute. Scalekit tunes the limit per account, so high-throughput workloads can need a higher limit than the default. Routing MCP tool calls through Scalekit on top of authentication traffic is one example of a workload that can need more headroom.
When you exceed the limit, Scalekit returns HTTP 429 Too Many Requests. Back off and retry with exponential backoff rather than retrying immediately.
Tell Scalekit limits apart from provider limits
Section titled “Tell Scalekit limits apart from provider limits”When you call tools through Scalekit, a 429 can come from either Scalekit or the upstream provider that Scalekit calls on your behalf, such as a CRM or data API. The error_code field on the error identifies the source:
error_code | Source | What to do |
|---|---|---|
RATE_LIMITED | Scalekit’s own rate limit | Reduce the overall request frequency and back off before retrying. |
TOOL_ERROR | The upstream provider rate-limited the tool call | Apply the provider’s recommended backoff. Check the tool call logs for the provider’s message. |
Review the detailed error in your dashboard’s tool call logs to confirm which provider and which tool triggered the limit.
Handle a 429 response
Section titled “Handle a 429 response”Every Scalekit SDK raises a dedicated exception when the API returns 429. Catch it, read the error code to determine the source, and back off before retrying.
import { ScalekitTooManyRequestsException } from '@scalekit-sdk/node';
try { // Your Scalekit SDK call, for example executing a tool await scalekit.tools.executeTool(/* ... */);} catch (error) { if (error instanceof ScalekitTooManyRequestsException) { // errorCode identifies the source of the 429 so you can back off correctly if (error.errorCode === 'TOOL_ERROR') { // Upstream provider rate-limited the call: apply provider-specific backoff console.error('Provider rate limit:', error.message); } else { // Scalekit's own rate limit (RATE_LIMITED): reduce overall request frequency console.error('Scalekit rate limit:', error.message); } }}from scalekit.common.exceptions import ScalekitTooManyRequestsException
try: result = scalekit_client.tools.execute_tool(...)except ScalekitTooManyRequestsException as e: # error_code identifies the source of the 429 so you can back off correctly if e.error_code == "TOOL_ERROR": # Upstream provider rate-limited the call: apply provider-specific backoff print("Provider rate limit:", e.message) else: # Scalekit's own rate limit (RATE_LIMITED): reduce overall request frequency print("Scalekit rate limit:", e.message)// Your Scalekit SDK call, for example executing a tool_, err := scalekitClient.Tools.ExecuteTool(ctx /* ... */)if err != nil { // Inspect the error code to find the source of the 429. // "TOOL_ERROR" is the upstream provider's limit; "RATE_LIMITED" is Scalekit's // own limit. Back off with exponential backoff before retrying. log.Printf("rate limited: %v", err)}try { // Your Scalekit SDK call, for example executing a tool scalekitClient.tools().executeTool(/* ... */);} catch (ScalekitException error) { // Inspect the error code to find the source of the 429. // "TOOL_ERROR" is the upstream provider's limit; "RATE_LIMITED" is Scalekit's // own limit. Back off with exponential backoff before retrying. System.err.println("Rate limited: " + error.getMessage());}Request a higher limit
Section titled “Request a higher limit”If your workload needs a higher limit, contact Scalekit support with your account details and your expected peak requests per minute. Plan ahead before you route additional traffic, such as MCP tool calls, through Scalekit. Scalekit reviews the request and adjusts the limit for your account.
When you estimate the limit you need, include headroom above your current peak so that normal spikes do not trigger 429 responses.