Macha

MachaBot Crawler

Last updated: May 2026

MachaBot is the web crawler operated by Macha (AGZ Technologies Private Limited). It fetches public web pages so Macha customers can use them as knowledge sources for their AI agents — for example, to answer support questions from a published help centre, product catalogue, or marketing site.

How to identify MachaBot

MachaBot sends the following User-Agent header on every request:

Mozilla/5.0 (compatible; MachaBot/1.0; +https://getmacha.com/policies/crawler)

If you see this string in your access logs, the traffic originated from Macha.

What MachaBot crawls

  • Only websites that a Macha customer has explicitly added to their account as a knowledge source.
  • Only pages that are publicly reachable without authentication.
  • Only pages on the same domain as the URL the customer added.
  • Up to a small, finite number of pages per source (the limit is set by the customer's Macha plan).

MachaBot does not attempt to log in, submit forms, bypass paywalls, or fetch resources behind a login.

Crawl behaviour

  • Requests are made sequentially with a 15-second timeout per page.
  • Non-HTML resources (images, PDFs, video, archives, fonts, scripts, stylesheets) are skipped during discovery.
  • Pages larger than 2 MB are aborted mid-download to avoid placing load on your origin.
  • MachaBot does not execute JavaScript — it reads server-rendered HTML only.

How to allow MachaBot

No action is required — MachaBot will follow your existing robots.txt rules for User-agent: *. If you want to explicitly allow it:

User-agent: MachaBot
Allow: /

How to block MachaBot

Add the following to your robots.txt to disallow all crawling by MachaBot:

User-agent: MachaBot
Disallow: /

MachaBot will stop crawling your site on its next scheduled run after the rule is published.

Contact

If MachaBot is causing problems for your site, or if you have any questions about its behaviour, please email [email protected] and we will respond promptly.