Macha

Choosing an AI Model for Support Agents: GPT-5 & GPT-5.4 Get Cheaper

Abbas, Customer Support & AI, Macha

Written by

Ankeet Guha, Co-founder & CTO, Macha

Reviewed by

Published June 25, 2026

Updated June 25, 2026

Choosing an AI model for support agents is mostly a trade-off between three things — capability, speed, and cost — and on June 17, 2026 we made that trade-off easier by cutting the price of two of the most capable models in Macha: GPT-5 dropped from 3 credits to 2, and GPT-5.4 dropped from 5 credits to 3, per response. Nothing about your plan changed, nothing about the models changed — the same agents you're already running now cost less every time they reply. If you were rationing GPT-5.4 to a handful of high-stakes agents because it was expensive, that calculus just shifted.

Choosing an AI Model for Support Agents: GPT-5 & GPT-5.4 Get Cheaper

This post is the practical, decision-oriented version: what actually changed, the new per-model math on a real 10,000 credits allowance, how the OpenAI models stack up against the Claude Sonnet and Groq options Macha also offers, and — the part most teams care about — how to choose which model each of your support agents should run on now that the gap between "cheap" and "capable" narrowed.

A note on what a credit is, because it matters here: in Macha a credit is spent per AI action — one complete assistant response from an agent — not per resolved ticket or per "deflection." A model's credit cost is just how much one of its responses costs. So when we say GPT-5 went from 3 to 2, every reply that agent writes got a third cheaper.

What changed on June 17

The whole lineup, with the change highlighted:

ModelCredits / responseChangeWhere it fits
GPT-5.4 Mini1unchanged (the default)Fast, high-volume. Most workflows.
GPT-52↓ from 3The balanced middle. Nuanced replies at volume.
GPT-5.43↓ from 5Highest OpenAI capability. Strict, complex agents.

GPT-5.4 Mini stayed put at 1 credit and remains the default for new agents — it was already the value pick, and we left it alone. The cuts land on the two models above it, which is where the price was actually doing the rationing.

An agent's Configuration in Macha: the Model dropdown sets which model the agent uses and shows its per-message credit cost (GPT-5, 3 credits/message).
An agent's Configuration in Macha: the Model dropdown sets which model the agent uses and shows its per-message credit cost (GPT-5, 3 credits/message).

For context on the trajectory: GPT-5.4 only launched in Macha on April 23 at 5 credits as OpenAI's most capable model (1M-token context window), with GPT-5 as the 3-credit default. Two months of running it in production told us the value was there at a lower price. GPT-5.4 Mini, meanwhile, had already been cut from 1.25 to 1 credit back in April when it became the default — so the only model that hasn't moved in months is the one most of you are already on.

The broader Macha model lineup spans roughly 0.5 to 9 credits per response depending on the model — including Anthropic's Claude Sonnet 4.5 and 4, and faster Groq-hosted options — so the three OpenAI models here sit in the affordable-to-mid band of what's available. The exact per-model cost is always shown next to the picker before you commit to a model.

What the cut actually buys you

Lower credit costs don't change your monthly allowance — they change how far it stretches. On the Professional plan's 10,000 credits per month, here's the honest before/after, assuming an agent does nothing but reply:

ModelResponses beforeResponses afterChange
GPT-5 (3 → 2 credits)~3,333~5,000+50%
GPT-5.4 (5 → 3 credits)~2,000~3,333+67%

So a GPT-5 agent now gets you half again as many responses on the same plan, and a GPT-5.4 agent gets you two-thirds more. (Real agents also spend credits on tool calls and sub-agent delegation, so treat these as ceilings for the reply portion, not a literal monthly quota.) The point isn't the exact number — it's that the more capable models are now meaningfully cheaper to run at volume, which is what unlocks using them where you previously couldn't justify it.

The Billing page Usage card in Macha, showing the monthly credit balance an agent's responses draw from.
The Billing page Usage card in Macha, showing the monthly credit balance an agent's responses draw from.

If you do run low, monthly credits aren't the only lever — top-up packs add credits that persist across billing cycles and are spent automatically after your monthly allowance. But for most teams the cut alone means the same plan now covers more ground. See the pricing page for current plans and credit allowances.

How to actually pick a model for a support agent

This is the question the price cut reopens, so let's answer it properly. The industry framing — borne out across model-selection guides from Microsoft's Foundry team and support-focused write-ups like Cobbai's LLM evaluation guide — is that you're trading off three things: capability, latency, and cost. There's no single best model; there's a best model for a given agent's job. Here's how we'd map Macha's three OpenAI options.

Default to GPT-5.4 Mini for the bulk of your volume

For most support work — answering "where's my order," explaining a policy, looking up an account, triaging and routing — GPT-5.4 Mini at 1 credit is the right call, and it's the default for a reason. It's fast, it's cheap enough to run on your busiest agents without thinking about it, and with a 400K-token context window it can hold long ticket histories and knowledge-base context without truncating. The consensus across support-automation research is that mini-class models are the best value for high-volume, latency-sensitive support — and that matches what we see. We wrote a deeper piece on exactly this: GPT-5.4 Mini for customer support.

Step up to GPT-5 when replies need more nuance

GPT-5 (now 2 credits) is the balanced middle: better at nuanced, judgment-heavy replies than Mini, while still cheap enough to run at scale. Good candidates are agents that draft customer-facing prose where tone and reasoning matter — apology-and-remedy responses, technical explanations, anything where a slightly flat or literal answer would read poorly. With the cut, the premium over Mini is now just one credit per reply, which is easy to justify on a customer-facing agent.

Reserve GPT-5.4 for strict, complex, high-stakes agents

GPT-5.4 (now 3 credits) is the highest-capability OpenAI model in the platform, and its real edge over GPT-5 is instruction-following — agents with strict "do this exactly once" rules, complex branching logic, or many tools where the model has to reliably pick the right one in the right order. If you've ever watched a cheaper model skip a step or call a tool twice, that's the failure mode GPT-5.4 is built to avoid. It's the model for billing/refund flows, multi-tool runbooks, and orchestration agents that delegate to sub-agents — the places where a wrong action is expensive. At 3 credits it's now the same cost the default GPT-5 was two months ago.

A common pattern that the lower prices make easier is tiering: a Mini-powered front-line agent handles the routine majority and delegates the gnarly cases to a GPT-5.4 specialist sub-agent. You pay the premium only on the small slice of conversations that need it. Because Macha lets each agent — and each sub-agent — set its own model, you can build that tier directly. The agents documentation covers per-agent model selection and delegation.

OpenAI isn't the only option: Claude Sonnet and Groq

If you've read any of the cross-provider model round-ups — the LLM API pricing comparisons that stack OpenAI against Claude, Gemini, and Llama — you'll know "choosing a model" is rarely an OpenAI-only decision. It isn't in Macha either. The three OpenAI models above are the most-used defaults, but the picker also includes Anthropic's Claude Sonnet (4.5 and 4) and Groq-hosted open models, and they earn their place for different jobs:

Provider / modelBest atWhy reach for it
OpenAI GPT-5.4 Mini (1 credit)Speed + broad capabilityThe default; best all-round value for high-volume support, strong tool ecosystem
OpenAI GPT-5 / GPT-5.4 (2 / 3 credits)Nuance + strict instruction-followingCustomer-facing prose and complex, multi-tool runbooks
Anthropic Claude Sonnet (4.5 / 4)Tone, empathy, long-document reasoningSensitive replies and agents that reason over long policies or 100-page manuals
Groq open modelsRaw latency + low costUltra-fast, cheap responses for simple, high-frequency lookups

The industry pattern, borne out in support-LLM guides, is that mini-class OpenAI models win on all-round value, Claude leads on instruction adherence and long-context reasoning, and the cheapest/fastest providers (Groq, and outside Macha, Gemini Flash or Llama) win when latency and per-message cost dominate. So if an agent's job is empathetic, sensitive handling — refund apologies, complaints, anything where tone carries the resolution — it's worth A/B testing Claude Sonnet against GPT-5 on a few real tickets before you commit. And if you have a high-frequency, dead-simple lookup agent, a Groq model can undercut even GPT-5.4 Mini on both speed and credits. The point of the price cut isn't "always pick OpenAI" — it's that you can match each agent to the provider that fits its job, and the costs are now close enough that capability, not budget, drives the choice.

A note on where the model is set

Worth flagging, because June 17 also tightened this: the model dropdown in the top-right of a conversation only changes the model for that one chat. It's there for quick experiments — try a reply on GPT-5.4, compare it to Mini, decide. If you want an agent to permanently run on a different model, that lives in the agent's configuration (and the org-wide default lives in Settings). The dropdown now shows a toast saying as much, so you don't accidentally think a one-off switch stuck. It's a small thing that prevents a real "why is my agent still on the old model" confusion.

Watch-outs and when not to reach for the expensive model

Cheaper top-tier models are not a reason to put every agent on GPT-5.4. A few honest caveats:

  • More capable ≠ always better for support. For fast, friendly, high-volume chat, a lighter model is often the better customer experience because it's quicker — capability you don't need just adds latency. Don't upgrade an agent that's already answering correctly.
  • The cheapest mistake is still a mistake. The credit cost is the small number; a wrong refund or a mis-routed escalation is the big one. Match the model to the cost of being wrong, not just the cost per response.
  • Vision and tooling vary by model. Image-reading on Zendesk attachments works on GPT-5.4 Mini and Claude Sonnet models; if an agent needs to read receipts or screenshots, check the model supports it before switching.
  • These are Macha credit costs, not OpenAI list prices. A credit is Macha's unit for one AI action across all providers; the underlying token economics differ by model. Don't reverse-engineer your OpenAI bill from credit counts.
  • Test before you roll out. Use the per-conversation dropdown to compare a real ticket on two models before you change an agent's default. It's free to look.

The healthiest setup for most teams is still Mini almost everywhere, GPT-5 on the customer-facing agents that write a lot of prose, GPT-5.4 reserved for the strict/complex few — and the June 17 cut just made the second and third tiers cheaper to adopt.

FAQ

What exactly changed on June 17, 2026? GPT-5 went from 3 credits per response to 2, and GPT-5.4 went from 5 credits to 3. GPT-5.4 Mini stayed at 1 credit and remains the default. No plan, allowance, or model behavior changed — only the per-response cost.

Do I have to do anything to get the lower price? No. Existing agents on GPT-5 or GPT-5.4 simply cost less per response from June 17 onward. There's nothing to migrate.

Is a credit the same as a resolved ticket? No. A credit is spent per AI action — one complete assistant response. A single ticket might take several responses (and tool calls), and Macha is automation/orchestration, so outcomes vary. Credits measure work done, not tickets closed.

Which model should most of my agents use? GPT-5.4 Mini (1 credit) for the high-volume majority, GPT-5 (2 credits) for nuanced customer-facing replies, and GPT-5.4 (3 credits) for strict, complex, or high-stakes agents. See the GPT-5.4 Mini deep dive for the default case.

Is OpenAI the only provider, or can I use Claude? Macha isn't OpenAI-only. The model picker also includes Anthropic's Claude Sonnet (4.5 and 4) — strong on tone, empathy, and long-document reasoning — and Groq-hosted open models for ultra-fast, low-cost responses. For sensitive, tone-sensitive replies it's worth testing Claude Sonnet against GPT-5 on real tickets; for dead-simple high-frequency lookups, a Groq model can be the cheapest and fastest option.

How do I change an agent's model permanently? In the agent's configuration card, not the conversation dropdown (which only affects the current chat). The org-wide default lives in Settings. The docs walk through it.

Try it

Macha runs your AI agents on top of the helpdesk you already use — Zendesk, Freshdesk, Gorgias, or Front — and lets you pick the right model for each agent's job. If you've been holding a smart agent back on cost, now's the time to revisit it. Start a 7-day free trial, no credit card required, or read more about running Macha on Zendesk.


Written by Abbas (Customer Support & AI, Macha) · Reviewed by Ankeet Guha (Co-founder & CTO) · Published 2026-06-24 · Last updated 2026-06-24.

Zendesk
5.0 on Zendesk Marketplace

Loved by support teams worldwide

See what support teams are saying about Macha AI.

The application seems excellent to me! We are still testing, and we need support for some details and they were extremely efficient too!

Daniela Costa

Daniela Costa

Head of Support, Seabra

Macha has been a great addition to our support toolkit. It generates clear, well-organized responses that fit naturally into our workflow. One feature we particularly appreciate is its ability to automatically reply in the same language as the ticket.

Marius F

Marius F

Support Head, Zentana

We've been using Macha for a little while now and it's been really great addition so far! It's powerful, convenient, and makes getting work done a lot easier for our agents.

Alexander Wedén

Alexander Wedén

Head of Support

Support team is very helpful and responsive. Really enjoy how lightweight this is within Zendesk itself vs other more intrusive tools.

Cathleen Wright

Cathleen Wright

Zendesk Admin, Cortex IO

So far it's pretty good! Our queries are a little nuanced, so we can't always use it, but it's got enough utility for us. It can even incorporate our bilingual country with greetings in a second language.

Jae Oliver

Jae Oliver

Head of Support, Wise

Really enjoying using Macha, it has made a noticeable difference to our support team in a short amount of time. I really like the ticket summary feature, saves us a lot of time.

Harry Jackson

Harry Jackson

Head of Support, Crumb

Macha AI is a great addition to my workspace! It's powerful, convenient, and it really makes productivity so much easier for our agents!

Dave G

Dave G

Head of Support, Cyber Power Systems

Very impressed! AI integration for Zendesk has certainly come a long way and Macha seems to set the standard for now. This will for sure save lot of time in our support team.

Pauli Juel

Pauli Juel

Head of CS, Dokument24

Macha has been working great for us so far! The auto-responses are accurate and our resolution time has dropped significantly.

Lana T

Lana T

Zendesk Admin, Swotzy

Macha AI is a great addition. The knowledge base feature means our agents always have the right answers at their fingertips.

Mischa Wolf

Mischa Wolf

Head of Support, Topi

We're enjoying this integration so far. It's made our support team more efficient and our customers get faster responses.

Paula G

Paula G

Head of Customer Support, Xly Studio

The team enjoys using it. It saves considerable time on common questions and the integration options are excellent.

Kilian Leister

Kilian Leister

Support Head, Didriksons

Ready to supercharge your team with AI?

Get started in minutes. Connect your tools, configure your agents, and let AI handle the rest.