API Rate Limits

Every API key gets 1,000 requests per minute and 10,000 per hour. Headers expose what's left.

Every Macha API key gets 1,000 requests per minute and 10,000 requests per hour. Limits are enforced per key, not per organization, if you have three keys in the same org, you get 3,000 requests per minute combined.

The two windows

Window	Limit	Resets
Minute	1,000 requests	Sliding, every 60 seconds
Hour	10,000 requests	Sliding, every 3,600 seconds

Both windows are checked on every request. If either is full, the request is rejected with 429 rate_limited. The headers (below) always reflect the more constraining window, whichever one you'll hit first.

Response headers

Every response (success or error) includes rate limit headers so you don't have to guess what's left:

Header	Meaning
`X-RateLimit-Limit`	Requests allowed in the current window (1000 or 10000 depending on which is tighter).
`X-RateLimit-Remaining`	Requests left in the current window before you hit 429.
`X-RateLimit-Reset`	Unix epoch seconds when the more-constraining window resets.
`Retry-After`	Only on `429` responses. Seconds to wait before retrying.

Example response headers on a normal request:

HTTP/1.1 200 OK
X-Request-ID: req_a0b88aa084bac0f7
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1782296880
Content-Type: application/json; charset=utf-8

The 429 response

When you exceed the limit, Macha returns 429 rate_limited with both the standard error envelope and a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1782296892

{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded. Retry in 12 seconds.",
    "request_id": "req_..."
  }
}

Honor Retry-After. It's set to the number of seconds until the more-constraining window has space again. Ignoring it and slamming retries will just keep hitting 429.

Building a rate-aware client

The simplest pattern: every request, glance at X-RateLimit-Remaining. If it's low, slow down. If you get a 429, sleep Retry-After seconds and try again.

async function callWithBackoff(url, init) {
  while (true) {
    const res = await fetch(url, init);

    // Pre-emptive slowdown: if we're close to the cliff, pause.
    const remaining = parseInt(res.headers.get('X-RateLimit-Remaining'), 10);
    if (remaining < 50) {
      const reset = parseInt(res.headers.get('X-RateLimit-Reset'), 10);
      const secsUntilReset = Math.max(0, reset - Math.floor(Date.now() / 1000));
      await new Promise(r => setTimeout(r, Math.min(secsUntilReset * 50, 1000)));
    }

    if (res.status !== 429) return res;

    const retryAfter = parseFloat(res.headers.get('Retry-After')) || 1;
    await new Promise(r => setTimeout(r, retryAfter * 1000));
  }
}

Python equivalent

import time, requests

def call_with_backoff(url, **kwargs):
    while True:
        r = requests.request(url=url, **kwargs)
        remaining = int(r.headers.get('X-RateLimit-Remaining', 1000))
        if remaining < 50:
            reset = int(r.headers.get('X-RateLimit-Reset', time.time()))
            time.sleep(min(max(0, reset - time.time()) * 0.05, 1.0))
        if r.status_code != 429:
            return r
        retry_after = float(r.headers.get('Retry-After', 1))
        time.sleep(retry_after)

Concurrent workers

If you have multiple processes hitting the API with the same key, share rate-limit state. Otherwise each worker sees its own headers and they'll independently slam the limit.

Reasonable approaches:

One process funneling requests, others enqueue
Redis-backed token bucket using X-RateLimit-Reset as the refill clock
Per-key sharding so each worker owns a key and a slice of the work

The third option also bypasses the limit entirely if you need more than 1,000/min, just create more keys.

What counts

Every authenticated request counts, including:

Successful requests (2xx)
Validation errors (422)
Authorization errors (403)
404s

What doesn't count:

Unauthenticated requests (401), these are rejected before the rate-limit middleware sees them
OPTIONS preflight requests (we don't support CORS today, but if we did)

Need more than 1,000/min?

Two options:

Split across keys. Each key has its own limit. Make 5 keys with the same scopes, hand each to a different worker, effective limit becomes 5,000/min.
Talk to us. Email [email protected] with your use case and current volume. Limits can be raised per-org for genuine high-volume integrations.

📘

The limit is generous on purpose

1,000/min is enough for almost any sane integration. If you're hitting it, the more common fix is to question why, usually it's a bug (e.g. polling instead of using triggers) rather than legitimate need. Macha's triggers are how you build event-driven integrations without polling.

Failure mode: Macha can't check the limit

If the rate-limit store itself is unavailable (rare but possible), Macha fails open, the request goes through without limit enforcement, and an error is logged on our side. We chose fail-open over fail-closed because losing rate limiting briefly is preferable to taking the entire API down.

You won't see this from the client side, the request just succeeds normally. We monitor for sustained failures internally.