Choosing the Right AI Model for Your Support Agents
Not all AI models are created equal. Some are fast and cheap, others are slow and smart. Some can read images, others can't. Choosing the right model for your support agent affects response quality, speed, cost, and which capabilities are available. Here's how to decide.
The models available
Macha supports models from three providers, each with different strengths:
OpenAI
- GPT-5.4 Mini (1.25 credits) — the latest mini model with a 400K context window. Strongest reasoning in the mini tier. Best for complex tickets that need careful judgement, multi-step logic, or long conversation histories. Supports image vision.
- GPT-5 Mini (1 credit) — fast, reliable, and cost-effective. Handles most everyday tasks well. The default choice for most agents. Supports image vision.
- GPT-4o Mini (0.5 credits) — previous generation at half the credit cost. Still solid for simple, high-volume tasks like tagging, routing, and quick lookups. Supports image vision.
Anthropic
- Claude Sonnet 4.5 (9 credits) — exceptional at complex reasoning, nuanced writing, and multi-step analysis. Best for detailed responses, tricky escalations, and tasks that need careful judgement. Supports image vision.
- Claude Sonnet 4 (9 credits) — strong reasoning with consistent, structured output. Good for data extraction, report generation, and workflows that need careful logic. Supports image vision.
Groq (open-source, fast inference)
- Llama 3.3 70B (1 credit) — great all-rounder at low cost. Strong multilingual support, especially for European languages. Does not support image vision.
- Llama 3.1 8B (0.5 credits) — ultra-fast, ultra-cheap. Best for high-volume, simple tasks — tagging, intent detection, routing. Does not support image vision.
How to choose
For most support agents: GPT-5 Mini
It's the default for a reason. Fast enough for real-time responses, smart enough for ticket classification, reply drafting, and tool calling. At 1 credit per response, it balances cost and quality. Start here and switch only if you hit a limitation.
For complex or high-stakes tickets: Claude Sonnet 4.5 or GPT-5.4 Mini
If your agent handles escalations, refund decisions, or nuanced multi-step reasoning (e.g., "check the order, compare against the return policy, determine if the refund is eligible, draft the response"), a more capable model reduces errors. Claude Sonnet is the strongest reasoner but at 9x the cost. GPT-5.4 Mini is a middle ground — stronger than 5 Mini at only 1.25x the cost.
For high-volume, simple tasks: GPT-4o Mini or Llama 3.1 8B
Tagging tickets, classifying intent, routing to groups — these don't need deep reasoning. Use the cheapest model that gets the job done. At 0.5 credits per response, you can process twice the volume for the same budget.
For multilingual support: Llama 3.3 70B
If your support team handles tickets in German, French, Spanish, Italian, and other European languages, Llama 3.3 is surprisingly strong — and at 1 credit per response, it's the same cost as GPT-5 Mini.
Image vision support
If your tickets include image attachments (screenshots, product photos, receipts), you need a vision-capable model. All OpenAI and Anthropic models support vision. Groq models do not — the agent will tell you to switch models if it encounters an image on Llama or Mixtral.
You can mix models
Different agents can use different models. Your high-volume triage agent runs on GPT-4o Mini (cheap, fast). Your escalation agent runs on Claude Sonnet 4.5 (smart, careful). Your WISMO agent runs on GPT-5 Mini (balanced). Each agent's model is set independently — there's no one-size-fits-all.