In the e-commerce ecosystem, suppliers are the core partners for product listing and contract fulfillment, and their service experience directly affects the operating efficiency of the platform. However, in traditional supplier work order processing, repetitive tasks such as manual sorting and data entry not only consume a lot of manpower, but also often lead to problems such as delayed response and inaccurate classification due to human errors. How to use technology to liberate manpower and improve service quality at the same time? Wayfair, a well-known home furnishing e-commerce platform, has provided its own solution.
Wayfair is an American home furnishing vertical e-commerce platform headquartered in Boston, USA. It was founded in 2002 and successfully went public in 2014. It is a benchmark enterprise in the field of home furnishing e-commerce in North America.
As a well-known home furnishing e-commerce platform, Wayfair not only focuses on the end consumer experience, but also sets its sights on the digital upgrade of supplier services. They built Wilma, a work order automation system based on LLM, and used LangGraph to orchestrate LLM calls and tool interactions to fully automate repetitive work in supplier work orders, achieving accuracy and efficiency beyond manual labor.
Tool introduction: Wilma is Wayfair's customer service assistant tool. It relies on LLM (such as Gemini, GPT) to generate responses for customer service that meet policy and empathy needs.
- Core value: The response speed is increased by 12%, the platform policy compliance rate is increased by 2%-5%, and it adapts to the high consultation volume during peak periods.
- Functionality: Customer service can select the type of help through 4 buttons, the system calls 40+ customized templates, fills in real-time data and generates a reply, which customer service can edit freely. After upgrading from "single prompt/LLM call" to "multiple prompts/multiple rounds of calls", simple conversations will be automated in the future to allow customer service to focus on complex issues.
For companies that are exploring the implementation of LLM automation, these practical experiences, pitfall avoidance guides and future plans from the front line are undoubtedly valuable industry information. Here’s a summary from the company’s technology team:
When people think of Wayfair customers, they first think of shoppers using the Wayfair website. But the suppliers (also known as manufacturers) of products listed on our platform are also our customers. Just as Wayfair has customer service specialists to assist shoppers, we also have supplier service specialists to support suppliers through SupportHub, a JIRA-based ticketing system for suppliers.
Much of the work of a supplier service specialist requires specialized skills and knowledge, but some of it is boring and repetitive work. For example, they need to manually sort work orders: review unstructured work order information, query relevant data in the database, and then enter corresponding structured data in the work order - specifically including setting the supplier ID, core issue type, and supplier's preferred language. To this end, we built a work order automation system based on LLM (as an extension of the Wilma product suite) to complete these tasks through automation to shorten response times, reduce costs, and improve the accuracy of the sorting process.

The core of Wilma is a series of LLM calls and tool calls orchestrated by LangGraph and triggered by events. When a supplier sends an email to Wayfair, SupportHub will create a new ticket and push events to the Pub/Sub topic through a webhook. Our Pub/Sub consumer will listen to these events, and once it detects events related to the creation of new work orders, it will call the LangGraph graph. This graph coordinates the execution of LLM calls, BigQuery table data queries, and interactions with SupportHub.

When called, the diagram performs the following steps:
1. Use LLM to identify the issue type (i.e., intent classification)
2. Use LLM to identify the supplier's preferred language
3. Enable an LLM agent with access to BigQuery, determine the supplier ID, and update the work order information through the JIRA interface.
Step 3 is by far the most complex link: the ReAct LLM agent with both reasoning and execution capabilities will receive a high-level task ("Identify supplier ID based on work order information") and can call the BigQuery tool. A typical reasoning and execution process is as follows:
> "The supplier ID is not explicitly mentioned in the work order description, but I found a possible stock keeping unit code (SKU), so the supplier ID will be queried through that code. ”
> Execute query: SELECT SupplierID FROM supplier_skus WHERE SKU="W008787164"
> "The query did not return results and you need to try other methods. The work order description contains the supplier's email address, and I will try to query the supplier ID through the email. ”
> Execute query: SELECT SupplierID FROM supplier_details WHERE SupplierEmail LIKE "contact@acme.com"
> "Success! ”
We found that the automated systems described above performed better than humans: identifying question types with 93% accuracy, identifying language with 98% accuracy, and identifying supplier IDs with 88% accuracy. By comparison, humans only correctly identified question types 75% of the time out of 81 possible categories. In addition to improved accuracy, Wilma also reduces work order processing time, freeing specialists from manual data entry to focus on complex problem solving and high-value supplier interactions.
When execution graphs become complex, being able to easily visualize and retrieve process steps becomes extremely valuable. We use Arize as our observability platform and rely on it to provide support both in the development phase before going online and in the debugging phase after going online. We found that a specially designed user interface for navigating and retrieving trajectories is more efficient than viewing raw output logs directly. In addition, we also set up alerts through Arize, which will send us timely notifications when there are abnormal fluctuations in token usage, latency and other indicators.

(Illustration: Visual display of graph calls in Arize)
When designing a complex system involving LLM, you need to make a trade-off between two solutions: one is an agent-driven solution (the agent has full decision-making power), and the other is a highly deterministic workflow-like system (the order of operations is strictly preset).
At the beginning of the project, we naively believed that a purely agent-driven approach would work. Specifically, we created three agents: a ticket management agent responsible for interacting with SupportHub, a classification agent that handles issue type and language identification, and a supplier ID agent that queries supplier IDs. Each agent is equipped with the tools they need to do their job, and a supervisor agent coordinates their interactions. This solution is simple in design and easy to expand, but we found that it has problems in communication and coordination: information provided by one agent may be ignored by another agent ; A supervisor agent may call an agent multiple times unnecessarily. For example, once the classification agent provided language information to the supervisor agent, and the supervisor agent passed the information to the work order management agent, but the work order management agent relied on its own "judgment" and mistakenly believed that there was no need to call the JIRA interface to update the work order.

(Illustration: Supervisor agent architecture diagram)
In order to improve the performance of graphics, we turned to a workflow solution: no longer relying on the supervisor agent to coordinate the operation of slave agents, but clearly defining a series of LLM calls and function calls required to complete the goal. This workflow solution is more reliable, easier to debug, and less expensive because it reduces the number of LLM calls. But we found it wasn't good at handling edge scenes. For example, if there is a spelling error in the supplier's email address (such as "supplier@acme.con"), the workflow solution will directly fail on the first query. ; Agent-driven solutions, on the other hand, can intelligently identify errors, realize there may be spelling issues, and try to requery using variations such as "supplier@acme.com".

(Illustration: Workflow architecture diagram)
In the end, we found that the best effect was a hybrid solution: based on the workflow, an agent for querying the supplier ID was embedded in a certain node of the workflow. Specifically, we replace the second half of the workflow with a ReAct LLM agent that has access to the BigQuery tool, allowing it to try different query strategies and reconstruct the query statement based on the observed results. This solution combines the advantages of both solutions: it not only ensures the certainty of the operation sequence, but also realizes intelligent error handling with the help of the agent's ability to interpret and solve errors in edge scenarios.
Wilma’s practical experience provided valuable insights for Wayfair’s technical team in designing LLM-based systems. At Wayfair, there are many other business processes where employees are doing relatively simple, repetitive tasks, covering finance, logistics, customer service and other areas. The work order sorting system described in this article is just one of the LLM automation applications that Wayfair has put into practice - for example, they have also developed a system to monitor supplier and customer interactions and flag scenarios that require the intervention of Wayfair specialists. As the model (and the team's professional ability to use the model) continues to improve, the Wayfair technical team hopes to be able to overcome more and more complex problems, allowing employees to focus on high-value tasks and more fully utilize their professional knowledge and skills.
(From: TesterHome)