Pre-Launch Checklist

Ten yes/no questions that gate every new agent. If you answer yes to all of them, you are ready to ship.

The Ten-Question Pre-Launch Checklist

If you can answer "yes" to every question below, your agent is ready to ship. If any answer is "no" or "I'm not sure," that is the next thing to fix before going live.

This is not bureaucracy. It is the distilled checklist from every successful agent launch we have seen. The teams who skip it are also the teams who hold the post-mortems.

1. Can You Describe The Agent's Purpose In One Sentence?

"This agent triages incoming Zendesk tickets by setting priority and tags." Yes.
"This agent handles support stuff and also does refunds and sometimes triages." No.

If you cannot describe the purpose in one sentence without using "and," you are looking at two agents. Split them.

2. Does The Tool Set Match The Purpose?

Look at the agent's assigned tools. Is every one of them justified by the purpose? Are there any tools that "might be useful one day"?

Yes if every tool maps to a step in the workflow. No if there are spectator tools.

If no, trim the tool set. See the Tool Selection page for recommended sets per role.

3. Is The Model Choice Deliberate?

You picked the model based on the workflow's complexity, not by leaving the default. You can articulate why this model and not the cheaper or more expensive one.

Yes if you ran the three-question framework from the Choosing a Model page. No if it is on default because nobody changed it.

Default GPT-5 is fine if it is the right answer. Default GPT-5 is wrong if you should have switched to Mini or Sonnet but did not bother to think about it.

4. Are The Instructions Specific, Numbered, And Tool-Aware?

Open the agent's instructions and read them as if you were the model. You should see:

An identity-establishing first paragraph.
A numbered workflow that names the actual tools.
Explicit boundaries (things the agent must not do).
Escape hatches for ambiguous cases.
Tone guidance for any customer-facing copy.

Yes if all five elements are present. No if any are missing.

5. Have You Chatted With The Agent Through At Least Ten Real Cases?

Stage 1 of the testing pattern. Open chat, walk through real workflows, watch the tools fire, approve or reject the writes through the confirmation gate.

The ten cases should include the happy path, edge cases (missing data, language mismatch, attachments), and at least one case the agent should refuse to handle.

Yes if you have ten cases logged with notes. No if you tried it on two and called it good.

6. Have You Run Test Run On At Least Ten Varied Inputs?

Stage 2 of the testing pattern. Test Run lets the agent run end-to-end as it would autonomously, but you see the output before any side effects ship.

Same input matrix as Stage 1: easy, hard, edge cases. The Test Run results should match what you saw in chat mode — if they diverge, investigate why.

Yes if Test Run cases pass without intervention. No if you need to "guide" the agent through them.

7. For Customer-Facing Agents: Have You Run A Week In Internal-Notes-Only Mode?

Stage 3 of the testing pattern. The single most important pattern in this guide. If your agent will send customer-facing writes, swap those tools for internal-only equivalents, run for a full week including a weekend, and review the drafts.

Your team's approval rate over the week should be 85%+. If it is lower, you have a list of exactly what to fix before going public.

Yes if a full week of internal-notes-only mode has passed with high approval rate. No if you skipped it because "the test runs looked good."

If your agent has no customer-facing writes (a triage agent, a routing agent, an internal-only research agent), this question is automatically yes.

8. Are The Trigger Conditions Narrow Enough To Be Comfortable?

Look at the trigger conditions. Imagine receiving a Slack notification on every fire for the first day. Would you be comfortable with that volume? Would the conditions only fire on tickets the agent was actually built for?

Yes if the conditions filter to a specific tag, brand, group, or other narrow attribute. No if it fires on every new ticket without filtering.

You can always expand. You cannot easily un-spam customers if the agent fires too broadly on day one.

9. Do You Have A Plan For Day 1, Day 3, And Day 7?

Concretely: who is checking the agent's conversation history on each of those days, and what are they looking for?

Day 1: Did the trigger fire as expected? Are there any obvious errors? Is the volume what you predicted?

Day 3: What is the quality of the agent's actions over the first 48 hours? Any patterns of bad behavior?

Day 7: Is the agent's quality stable? Has the team's confidence in it grown? Should you expand trigger conditions?

Yes if there is a named owner and a calendar reminder. No if the plan is "we'll keep an eye on it."

10. Do You Know How To Roll Back?

If something goes wrong on day 2, can you stop the agent immediately?

The minimum roll-back path: deactivate the agent (toggle in the agent's settings page). The trigger will stop firing within seconds. Customer-facing writes stop. The agent and its history are preserved for investigation.

Yes if you know exactly which page and which button. No if you would have to figure it out under pressure.

If your team has multiple admins, make sure at least two of them know the roll-back procedure. In a real incident, the person who knows is sometimes asleep.

Bonus: The Soft Criteria

Three additional checks that are not strictly pre-launch but are worth confirming before scaling up:

The team is genuinely excited (not just compliant) to ship the agent. Reluctant launches go badly. If your support team is uneasy, do another week of internal-notes-only and address their feedback.
You have a "boring scenario" the agent is good at. Not the impressive demo case — the most common, most boring real ticket. If the agent handles the boring case well, it will handle the rare cases acceptably. If it can only do impressive demos, it will fail in production.
You have written down the failure modes you accept. No agent is perfect. Knowing what tradeoffs you have made — "the agent will sometimes escalate when it should have answered, and that is OK" — prevents post-launch panic when the inevitable imperfection appears.

Use This Checklist On Every Launch

Including small ones. Especially small ones. The thing about small launches is that they bypass the careful review that big launches get. They are also where most of the surprises happen.

Ten minutes of checklist-running has prevented more bad days than any other discipline in this guide.

Previous ← Common Pitfalls

Next Using AI to Help You Build →