How to Pick the Right AI Model for Your Agent on Macha
Not every workflow needs the same AI model — and picking the right one per agent is the single biggest lever you have over both quality and cost. A simple triage agent and a complex multi-step resolver have very different needs, and Macha lets you choose a different model for each. Here's how to choose well.
Watch the 60-second guide
Where you set the model
The model is a per-agent setting. Open an agent and pick its model from the top-right of the configuration screen. Because it's per agent, you can run your high-volume triage agent on a cheap, fast model and reserve a stronger model for the one agent that genuinely needs deep reasoning — instead of paying premium rates across the board.
The three things that vary between models
When you compare models, you're really trading off three things:
- Quality — how well it handles nuance, long instructions, and tricky reasoning.
- Speed — how fast it responds (matters for live chat and high volume).
- Cost — its credits-per-response rate (0.5 to 9, depending on the model).
The available models and their rates:
| Model | Credits / response | Good for |
|---|---|---|
| GPT-5.4 Mini (default) | 1 | Most support work — triage, drafts, summaries, field updates |
| GPT-5 | 3 | Complex, multi-step agents with long instructions |
| GPT-5.4 | 5 | The hardest reasoning where quality is paramount |
| Claude Sonnet 4.5 / 4 | 9 | Premium reasoning and writing quality |
| Llama 3.3 70B | 1 | A fast, capable open option |
| Llama 3.1 8B / Mixtral 8×7B | 0.5 | Simple, high-volume tasks where cheapest wins |
How to choose: match the model to the task
A simple rule covers most cases:
- Routine support work (classify, tag, summarize, draft a reply, update a field) → GPT-5.4 Mini (1 credit). It's fast, affordable, and handles these well. For a high-volume operation, make it your default.
- Complex agents — long, detailed instructions with many steps and rules to remember → GPT-5 (3 credits). Strong quality at a reasonable cost; it's the right step up when Mini starts missing nuance.
- The hardest reasoning or highest-stakes writing → a premium model (GPT-5.4 or Claude Sonnet). Use these sparingly, on the one or two agents that truly need them.
- Massive-volume, very simple tasks → the cheapest options (Llama 3.1 8B / Mixtral at 0.5 credits) can cut cost further, if quality holds.
A practical way to decide
Don't agonize over it up front — let the test run tell you:
- Start on GPT-5.4 Mini. It's the right answer most of the time.
- Test the agent against real tickets. If the quality is good, you're done — keep the cheap model.
- If it misses nuance — misreads intent, fumbles long instructions — step up to GPT-5 and test again.
- Only go premium if GPT-5 still isn't enough for that specific agent.
This "start cheap, step up only if needed" approach gets you the lowest cost that still does the job — per agent.
A note on speed
For anything customer-facing in real time — a website chatbot, live chat — response speed matters as much as quality. The faster models (Mini, Llama) feel snappier to a waiting visitor; a heavier model can be worth the wait for a back-of-house agent doing complex analysis where nobody's watching the clock.
A model cheat-sheet by agent type
If you want a quick starting point instead of deciding from scratch:
| Agent | Start with | Why |
|---|---|---|
| Ticket triage / tagging | GPT-5.4 Mini (1) | Simple classification, high volume |
| Summarizer | GPT-5.4 Mini (1) | Straightforward, runs a lot |
| WISMO / order lookup | GPT-5.4 Mini (1) | Fetch and format, not deep reasoning |
| Refund / policy agent | GPT-5 (3) | Judgment and rules to follow carefully |
| Complex multi-step resolver | GPT-5 (3), premium if needed | Long instructions, real nuance |
| Live website chatbot | GPT-5.4 Mini / Llama (fast) | Speed matters for a waiting visitor |
The pattern: start everything on Mini, and promote only the agents that prove they need more. A test run will tell you which ones those are.
Frequently asked questions
Where do I set an agent's model? On the agent's configuration screen, top-right. It's a per-agent setting.
Which model should most agents use? GPT-5.4 Mini (1 credit) — it handles the bulk of support workflows well.
When should I use a stronger model? When an agent has long, complex instructions or needs deeper reasoning — step up to GPT-5, and only go premium if that's still not enough.
Can different agents use different models? Yes — that's the point. Cheap-and-fast for high-volume agents, stronger for the few that need it.
How does the model affect cost? It sets the credits-per-response rate (0.5 to 9). See Macha credits explained.
The bottom line
Pick the model per agent, match it to the task, and start cheap: GPT-5.4 Mini for everyday support, GPT-5 for complex agents, premium only where it's truly needed. Let the test run prove whether you can stay on the cheaper model — and you'll get the right quality at the lowest cost, agent by agent.
Try it: build an agent, pick a model, and test it on real tickets. 7-day free trial, no credit card required. Start free.