What we learned about prompt engineering from building recurring issue detection workflows

Jan 12, 2026

Author: Hexi Xiao

What problem we tried to solve

A common problem Managed Service Providers (MSPs) face is recurring issues—the same problem occurring multiple times for the same customer. Previous occurrences are often buried in historical tickets handled by different technicians. As a result, a massive amount of valuable customer context sits idle in ticket notes.

When a new ticket arrives, technicians are under pressure to respond quickly and rarely have time to carefully review old notes. Recurring issues are easily missed; at best, resolution takes longer, and at worst, customers become frustrated because they feel unheard. This is the ideal moment to ask AI for help: to quickly sift through large volumes of historical data on behalf of a human and detect recurring issues before the technician even opens the ticket.

Our Approach

A large language model (LLM) is a natural-language machine learning model trained on massive corpora of text to predict the next word based on prior context. By design, LLMs are well suited for processing large amounts of text.

For MSPs, ticket data is the central repository of information and correspondence, typically stored as unstructured or semi-structured text. This makes LLMs a strong fit for extracting signal from noisy, text-heavy datasets.

We built an AI agent with two core capabilities: 1) listing tickets and 2) reading ticket details such as notes, labor logs, time entries, and comments. We instructed the agent to scan ticket titles for signs of recurring issues and then dive into the details of relevant matches to identify meaningful connections.

What worked

Putting the goal upfront: This is the single most effective way to reduce hallucinations. When we provide a concise goal at the beginning, the AI maintains focus as it processes data, making it less likely to deviate — if it encounters conflicting information later.

Describing how to do the work: We outlined high-level steps for the AI: 1) get initial ticket details, 2) retrieve customer context, 3) analyze recent tickets, and 4) summarize findings. Clear procedural instructions reduced hallucinations by giving the model a concrete execution path and clearer expectations for tool usage.

Providing examples of expected output: When asked to summarize a large corpus of data, AI responses can vary wildly in length and structure. However when we gave a few examples, AI will try to match the response structure to our example, and gave us an answer in a more consistent format. Consistent answer format is important for post processing.

Appending edge cases at the end: Production data often breaks logic in unexpected ways. When we first turned on the workflow with production data, we uncovered many scenarios missed during the initial workflow build-out. As we encountered failures, we appended specific instructions on how to handle those scenarios at the end of the prompt. This allowed us to iterate on reliability over time.

Use reasoning model: At this stage, this became a requirement. While reasoning models are slightly more expensive, we found that standard models struggled to consistently maintain the multi-step logic required for this workflow.

What didn’t work

We can’t switch models interchangeably. We experimented with switching models between steps and workflows, but results were highly inconsistent—even with identical prompts. Each model appears to have its own behavioral quirks and follows certain instructions better than others.

Multi-agent systems performed worse than single-agent systems. We attempted to split responsibilities across multiple smaller agents so each could operate with fresher context. While conceptually appealing, agents performed poorly at generating precise instructions during hand-offs. As the number of agents increased, success rates dropped sharply. Small hallucinations in early agents were amplified at every subsequent layer.

We still face scaling limitation from context window limit. Searching for recurring issues across all historical tickets is like looking for a needle on a sandy beach—difficult even for machines. The most effective way to improve accuracy is to reduce the search space. We ultimately introduced a business requirement to limit how far back we search. While this increases the risk of missing some recurring issues, it also avoids being overwhelmed by low-quality historical data.

Lessons learned

If we were to summarize our lessons learned from building ticket into bullet points, here are the dos and don’ts:

DO: State your goal concisely at the beginning of the prompt
DO: Describe how to get the job done if you want reliability
DO: Provide examples of what you expect in the output
DO: Articulate edges case and how to handle them
DON’T change model frequently. Stick to a strong base model and learn its behavior
DON’T get too fancy with your setup. Stick to a single agent with minimal tools
DON’T try to do too much with one agent. Set clear limits on how much data to process

Bye for now,

Hexi