How AI Agents Can Read Image Attachments in Zendesk Tickets
When a customer attaches a screenshot of a broken checkout page or a photo of a damaged product, your support agents need to open the image, understand it, and respond. Until now, AI agents couldn't do that — they could read text attachments but images were invisible. That changes with inline image vision.
The problem: AI agents were blind to images
Support tickets don't just contain text. Customers attach screenshots of error messages, photos of damaged products, receipts for refund requests, and tracking page screenshots. Traditional AI agents could only process text — PDFs, spreadsheets, plain text files. Images? "This file type cannot be read."
That meant every ticket with an image attachment required a human to look at it, understand it, and either respond or add context for the AI. In high-volume support teams, this created a bottleneck.
How inline image vision works
Macha's AI agents can now read image attachments in Zendesk tickets using inline vision. When an agent encounters an image attachment, it:
- Downloads the image securely from Zendesk's CDN using your connector's authentication
- Sends it to the AI model as part of the conversation — the model literally sees the image pixels
- Responds based on what it sees — describing the content, extracting text, identifying products, or flagging issues
This works with JPEG, PNG, GIF, and WebP images up to 5MB. The image is processed once and the AI's description becomes part of the conversation context for future reference.
What the AI can see and do
With vision-capable models like GPT-5 Mini, GPT-5.4 Mini, and Claude Sonnet, your agent can:
- Read text in screenshots — error messages, tracking numbers, order confirmations
- Identify products — match a photo to your catalogue, spot damage or defects
- Parse receipts — extract amounts, dates, transaction IDs from photos of receipts
- Describe what it sees — "This is a screenshot of a DHL tracking page showing the parcel was delivered on April 14th"
- Combine image + text context — the agent reads both the customer's message and their attached image to form a complete understanding
How it fits into autonomous workflows
Image vision works seamlessly with Macha's trigger system. When a new Zendesk ticket arrives with an image attachment:
- The trigger fires and invokes your AI agent
- The agent reads the ticket — subject, description, custom fields, conversation history
- The agent sees the image attachment and calls the read-attachment tool
- The AI model analyses the image and incorporates its understanding into the response
- The agent posts a reply or internal note based on the complete context — text and visual
No human intervention needed. The entire flow runs autonomously.
Which models support vision?
Not all models can process images. Here's the breakdown:
| Model | Vision | Best for |
|---|---|---|
| GPT-5.4 Mini | Yes | Strongest reasoning + vision at low cost |
| GPT-5 Mini | Yes | Everyday tasks with vision support |
| Claude Sonnet 4.5 | Yes | Complex image analysis, detailed descriptions |
| Llama 3.3 70B | No | Text-only — agent gets a clear "not supported" message |
If your agent runs on a non-vision model, it gracefully handles the limitation — the tool returns a message suggesting which models to switch to, rather than crashing.
Getting started
Image vision is available on all Macha plans with a Zendesk connector. No additional setup required — connect Zendesk, add the Read Attachment tool to your agent, and it works. The agent automatically detects image attachments and processes them using your selected model's vision capabilities.