API Rate Limits

Every API key gets 1,000 requests per minute and 10,000 per hour. Headers expose what's left.

Every Macha API key gets 1,000 requests per minute and 10,000 requests per hour. Limits are enforced per key, not per organization, if you have three keys in the same org, you get 3,000 requests per minute combined.

The two windows

WindowLimitResets
Minute1,000 requestsSliding, every 60 seconds
Hour10,000 requestsSliding, every 3,600 seconds

Both windows are checked on every request. If either is full, the request is rejected with 429 rate_limited. The headers (below) always reflect the more constraining window, whichever one you'll hit first.

Response headers

Every response (success or error) includes rate limit headers so you don't have to guess what's left:

HeaderMeaning
X-RateLimit-LimitRequests allowed in the current window (1000 or 10000 depending on which is tighter).
X-RateLimit-RemainingRequests left in the current window before you hit 429.
X-RateLimit-ResetUnix epoch seconds when the more-constraining window resets.
Retry-AfterOnly on 429 responses. Seconds to wait before retrying.

Example response headers on a normal request:

HTTP/1.1 200 OK
X-Request-ID: req_a0b88aa084bac0f7
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1782296880
Content-Type: application/json; charset=utf-8

The 429 response

When you exceed the limit, Macha returns 429 rate_limited with both the standard error envelope and a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1782296892

{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded. Retry in 12 seconds.",
    "request_id": "req_..."
  }
}

Honor Retry-After. It's set to the number of seconds until the more-constraining window has space again. Ignoring it and slamming retries will just keep hitting 429.

Building a rate-aware client

The simplest pattern: every request, glance at X-RateLimit-Remaining. If it's low, slow down. If you get a 429, sleep Retry-After seconds and try again.

async function callWithBackoff(url, init) {
  while (true) {
    const res = await fetch(url, init);

    // Pre-emptive slowdown: if we're close to the cliff, pause.
    const remaining = parseInt(res.headers.get('X-RateLimit-Remaining'), 10);
    if (remaining < 50) {
      const reset = parseInt(res.headers.get('X-RateLimit-Reset'), 10);
      const secsUntilReset = Math.max(0, reset - Math.floor(Date.now() / 1000));
      await new Promise(r => setTimeout(r, Math.min(secsUntilReset * 50, 1000)));
    }

    if (res.status !== 429) return res;

    const retryAfter = parseFloat(res.headers.get('Retry-After')) || 1;
    await new Promise(r => setTimeout(r, retryAfter * 1000));
  }
}

Python equivalent

import time, requests

def call_with_backoff(url, **kwargs):
    while True:
        r = requests.request(url=url, **kwargs)
        remaining = int(r.headers.get('X-RateLimit-Remaining', 1000))
        if remaining < 50:
            reset = int(r.headers.get('X-RateLimit-Reset', time.time()))
            time.sleep(min(max(0, reset - time.time()) * 0.05, 1.0))
        if r.status_code != 429:
            return r
        retry_after = float(r.headers.get('Retry-After', 1))
        time.sleep(retry_after)

Concurrent workers

If you have multiple processes hitting the API with the same key, share rate-limit state. Otherwise each worker sees its own headers and they'll independently slam the limit.

Reasonable approaches:

  • One process funneling requests, others enqueue
  • Redis-backed token bucket using X-RateLimit-Reset as the refill clock
  • Per-key sharding so each worker owns a key and a slice of the work

The third option also bypasses the limit entirely if you need more than 1,000/min, just create more keys.

What counts

Every authenticated request counts, including:

  • Successful requests (2xx)
  • Validation errors (422)
  • Authorization errors (403)
  • 404s

What doesn't count:

  • Unauthenticated requests (401), these are rejected before the rate-limit middleware sees them
  • OPTIONS preflight requests (we don't support CORS today, but if we did)

Need more than 1,000/min?

Two options:

  1. Split across keys. Each key has its own limit. Make 5 keys with the same scopes, hand each to a different worker, effective limit becomes 5,000/min.
  2. Talk to us. Email [email protected] with your use case and current volume. Limits can be raised per-org for genuine high-volume integrations.
📘
The limit is generous on purpose

1,000/min is enough for almost any sane integration. If you're hitting it, the more common fix is to question why, usually it's a bug (e.g. polling instead of using triggers) rather than legitimate need. Macha's triggers are how you build event-driven integrations without polling.

Failure mode: Macha can't check the limit

If the rate-limit store itself is unavailable (rare but possible), Macha fails open, the request goes through without limit enforcement, and an error is logged on our side. We chose fail-open over fail-closed because losing rate limiting briefly is preferable to taking the entire API down.

You won't see this from the client side, the request just succeeds normally. We monitor for sustained failures internally.

© 2026 AGZ Technologies Private Limited