Batch AI Extraction with Studies
Run an AI extraction across thousands of records — Zendesk tickets, support history, and more. Define the columns you want, get a structured results grid, export to CSV, or push back as a knowledge source.
What Studies are
Studies let you run an AI extraction across a list of records and get structured results back. Instead of opening one ticket at a time and asking an agent for an answer, you point a Study at a set of records, define the columns you want filled in, and Macha extracts those columns from every record in parallel.
The end result is a spreadsheet-shaped grid of insights: one row per record, one column per field you defined. You can browse it, filter it, export it to CSV, or push it back into Macha as a knowledge source for your agents to search.
Studies are the right tool when you have a question that needs an answer across many records — auditing the last 5,000 tickets for refund mentions, classifying support volume by root cause, finding every ticket where the customer raised a billing concern. Doing that by hand is hours of work; doing it through chat would mean opening a conversation per record. A Study runs the same extraction over every record at once.
How a Study is shaped
Every Study has four parts:
- An input source — where the records come from. Today that's Zendesk tickets matched by a search query and an optional date range. More record sources are coming.
- Input fields — the parts of each record you want the AI to see. Picking only what's necessary keeps the prompt small and costs predictable.
- A schema — the columns you want filled in. Each column has a type (boolean, single select, multi select, number, short text, long text), a label, and optional guidance for the AI on how to fill it.
- A model — which LLM does the extraction. The model you pick sets the credit cost per record.
The model reads the input fields you selected, follows your instructions and per-column guidance, and returns a value for each column. One record in, one row out.
Creating a Study
Studies live in the sidebar under their own section. To create one:
- Open the Studies page and click New Study.
- Give it a name and an optional description.
- Pick the input source — for example, Zendesk Tickets.
- If the source needs a connector, pick the connected instance. Multi-instance accounts (e.g. two Zendesk accounts) each show up separately.
- Configure the input — add a search query and/or a date range to scope the records you want.
- Pick the input fields you want the AI to read.
- Define the schema — add a column for each piece of information you want extracted.
- Choose a model and save.
You don't have to run the Study right away. It's a saved configuration you can re-run any time — useful for monthly audits or recurring classification.
The input source: Zendesk Tickets
The first input source available is Zendesk Tickets. It pulls tickets that match a Zendesk search query, optionally bounded by a date range.
Configuring the query
- Search query — any Zendesk search expression (the same syntax you'd use in the Zendesk search bar). Examples:
status:open priority:urgent,tags:refund,brand_id:1234 form:billing. Leave it empty to match all tickets in the date range. - From / To — restrict to tickets created in a date range. Useful for "last quarter" or "since the new policy went live" audits.
Tip
The estimate Macha shows before you run a Study is calculated from the exact same query that the run will use — so the number you see is what you'll process. If it's higher than you expected, tighten the query before running.
Ticket fields you can include
You choose which fields the AI sees for each ticket. The more you include, the more context the model has — but also the larger the prompt and the slower the run. Available fields:
- Subject — the ticket subject line
- Description — the customer's first message
- Status, Priority, Type, Tags — ticket metadata
- Created at, Updated at — timestamps
- Requester ID, Assignee ID, Group ID — identifiers
- Custom fields — every populated custom field with its human-readable label and resolved value. Flagged as expensive because it requires fetching the field schema.
- Full comment thread — the complete public + internal conversation. Flagged as expensive because it adds an extra API call per ticket and significantly grows the prompt.
For most extractions, Subject and Description are enough. Add the comment thread only when the answer genuinely needs the full conversation.
Defining the schema
The schema is the heart of a Study. Each entry is one column in the final results grid. You can have as many columns as you need.
Column types
- Boolean — yes/no. Best for "is this a refund request?" or "did the customer mention a competitor?"
- Single select — one value from a list you define. Best for categorical classification: billing / shipping / product / other.
- Multi select — zero or more values from a list. Best for tags: which issues are mentioned.
- Number — a numeric value. Best for counts, scores, or amounts the model can read from the record.
- Short text — a brief free-text answer. Best for things like "what is the customer's main concern in one sentence?"
- Long text — multi-line free text. Best for summaries, suggested replies, or extracted quotes.
Writing good guidance
Each column has an optional guidance field. This is where you teach the model what you actually want. The guidance is included in the extraction prompt for that column, so be specific:
- For a single-select column "Root cause", spell out what each option means and how to disambiguate edge cases.
- For a boolean "Refund requested", clarify the difference between asking about a refund and actually requesting one.
- For a short-text "Customer's main concern", say how long it should be ("one sentence") and what voice it should use ("describe the issue, not the customer's emotion").
Treat guidance like instructions you'd give a new analyst. The clearer the rules, the more consistent the output.
Tip
When the answers come back inconsistent, the fix is almost always in the guidance, not the model. Tighten the column definition, add examples in the guidance, and run a test on a small batch before re-running the full set.
Choosing a model
Studies use the same model lineup as agent chat. The model you pick drives the credit cost per record — which compounds across the whole run, so it matters a lot more here than in a one-off chat. A few rules of thumb:
- For simple classification, boolean extraction, or short field lookups, a mini model is plenty.
- For nuanced multi-column extractions, long-form summaries, or judgement calls (e.g. "did this agent handle the ticket well?"), step up to a more capable model.
- When in doubt, run a test on 10–20 records first with the model you're considering. Look at the output. If it's good enough, keep the model. If not, step up — the cost difference at small scale is tiny, and you'd rather know now than after burning credits on 5,000 records.
Estimate and the pre-run gate
Before any run starts, Macha shows you a Review screen with:
- The number of records the input is going to match
- The credit cost per record (based on the model)
- The total estimated credits for the run
- Your remaining credits
You can't accidentally run a 10,000-ticket study without seeing the cost first. If the estimate is uncomfortable, go back, narrow the query, or switch to a cheaper model.
If you're short on credits at this stage, you can buy a top-up pack directly from the Review screen — top-up credits never expire and they're consumed automatically after your monthly allowance runs out.
Test runs
Before committing to a full run, do a test. A test run takes a small sample (typically 10–50 records) and runs the same extraction on it. Use it to:
- Sanity-check that the schema produces the answers you actually want.
- Tune the per-column guidance.
- Make sure the model is capable enough for the task before scaling up.
Test runs are marked as previews in the UI so they don't clutter your real history. They still cost credits (you're paying for real model calls), but only for the sample size.
Running, progress, and cancelling
When you start a real run, Macha kicks off a background worker that streams records from the source and processes them with bounded concurrency — multiple records run in parallel, but never so many that we'd overwhelm the source API or your model quota.
The run page shows progress live: how many records processed, how many succeeded, how many errored, and how many credits have been spent so far.
You can cancel a run at any time. Cancellation is graceful — the worker finishes the records it has in flight, then stops. You only pay for what was actually processed; nothing is charged for records that never ran.
How a run stops
A run can finish in one of several ways:
- Completed — every record in the input was processed.
- Cancelled — you stopped the run from the UI.
- Out of credits — your balance hit zero mid-run. The worker stops gracefully and reports the stop reason on the run.
- Cap reached — the run hit the platform's hard ceiling. Studies are capped at 20,000 records per run. To go bigger, split into multiple runs with date-range filters.
- Failed — a fatal error (e.g. the connector disconnected mid-run). Whatever was processed before the failure is preserved.
Reviewing results
When a run completes, its results page shows a table — one row per record, one column per schema field, plus a status column. Each row links back to the source record (e.g. the Zendesk ticket URL) so you can verify the extraction against the original.
Useful things you can do from the results page:
- Sort and filter — group rows by category, find every record where a boolean is true, find rows that errored, etc.
- Drill into a row — open the full record and see what the model saw and what it returned.
- Export to CSV — download the whole result set as a spreadsheet.
- Push to a Knowledge Source — see below.
Frozen snapshots
Every run keeps a copy of the input config, schema, and model it ran with — frozen at the moment the run started. Editing the parent Study later doesn't rewrite history. Past runs always reflect the configuration they ran under, so audit trails stay clean.
Pushing results into a Knowledge Source
Studies don't just produce a static spreadsheet — they can feed your agents. From a completed run's results page, you can export the results as a new knowledge source. Each row becomes a document, composed from the columns you choose, and indexed via the same embeddings pipeline that powers the rest of Macha's knowledge.
This is how Studies become operational instead of one-off analyses. Run a Study to extract structured insights from your last 5,000 tickets, push the results to a knowledge source, and now your support agent can search those insights when handling new tickets.
You choose which fields end up in each document and which one is used as the title. From there it behaves like any other knowledge source — assign it to an agent, set scope, and the agent's search_knowledge and get_document tools can pull from it.
Credits and pricing
Studies use the same credit system as the rest of Macha. The cost per record is the credit cost of the model you picked — so a Study on a mini model costs less per record than the same Study on a top-tier model.
- Credits are deducted per successful record, not up front.
- Errored records don't cost anything.
- Cancelled records (ones that never ran) don't cost anything.
- Monthly plan credits are used first; once those are spent, top-up credits kick in automatically. Top-up credits never expire.
- Enterprise plans bypass per-record credit checks.
Plan availability
- Trial — Studies are not available on trial.
- Starter — not available; upgrade to Professional to use Studies.
- Professional — full access.
- Enterprise — full access, with credit checks bypassed.
Best practices
Start narrow, then widen
The first time you build a schema, run it against a small date range or a tight query first. Look at the results. Iterate on the schema and guidance. Once you're happy, re-run against the full set.
One question per column
A column is meant to answer one question. If a column tries to do two things at once ("category and severity"), split it into two columns. Single-purpose columns are easier for the model to answer consistently and easier for you to filter on later.
Use single-select over short-text when you can
Single-select forces the model to pick from your taxonomy, which makes the column easy to group, sort, and chart. Short-text leaves it free-form, which is useful for genuine free-text answers but bad for anything you want to count.
Don't pull comments unless you need them
The full comment thread roughly doubles or triples the prompt size and adds an API call per ticket. If the answer is in the subject and description, leave comments off.
Run smaller batches across longer windows
If you want to study "the whole year" but it's 60,000 tickets, run four quarterly Studies instead of one giant one. You stay under the 20,000-record cap, you can review intermediate results, and you can iterate the schema between runs if something looks off.
Push high-value runs to a knowledge source
If a Study's results would help your agents (e.g. a classified backlog of past resolutions, or a clean list of known issues), don't just export to CSV. Push it to a knowledge source so your agents can search it during real conversations.
Tip
A Study is just a saved configuration — it costs nothing until you run it. Iterate on the schema freely; only test runs and real runs use credits.
© 2026 AGZ Technologies Private Limited