What makes a good generative AI tool for customer service?

11 Oct 2024

12 ninute read

“We’re building our own generative AI chatbot”

That’s a response we hear most weeks when we speak to people about implementing AI for customer service. Of course, with the wide availability of Large Language Models (LLMs) and constant improvements, doing this solo looks pretty easy. 

The other thing we come across is inexpensive generative AI tools that can be “trained” on your data to answer customer service questions. It can be hard to sift through the marketing language to see the real truth beneath, and to understand the difference between one tool and the next.

In both cases, if you are going down either path, you have to know what a good generative AI experience looks like. “Good” will be subjective, but “bad” is less so. 

Let’s take a look at what’s necessary for a good implementation, and we’ll start with some bad.

Part 1: What can go wrong?

Avoiding hallucinations

The most obvious problem with generative AI is how trustworthy the answers are and how prone they are to hallucinations. If you don’t know what hallucinations are, then you should not be thinking about generative AI solutions! Let’s explain and recap:

Here’s an example. We asked ChatGPT “what is the return policy for Nike UK?”

Here was part of its response:

Nike UK's return policy allows customers to return items within 30 days of the delivery date, for any reason, including items that have been worn or used. Here are the main details:

Refunds: Refunds will be issued to the original form of payment within 2 business days after the return is received at their warehouse. It may take up to 10 business days for the refund to reflect in your account, depending on your bank or payment provider.

If you look at that, you think: hey not so bad. But when you look at Nike’s refund page, this is what it says:

Once we receive your items, we'll process your return and initiate a refund to the original form of payment. We'll send you an email confirmation, but it may take up to 14 days to complete your refund.

So, ChatGPT says 2 business days, or 10 business days, while Nike itself says 14 days. Where has ChatGPT got the 2 and 10 figures from? That’s an example of a hallucination – where AI just invents the answer. 

This is a pretty minor example, but the potential for a disastrous response is huge. What if an AI promises a free gift, or generates a fake discount code? Those would be much more problematic, and could end up costing you a lot. Think about the case with Air Canada, where it was held liable for the bad advice it gave a passenger. 

How can you avoid hallucinations? That requires teaching your AI to only use information from very specific sources, and ensuring that it doesn’t start reaching outside of those parameters in order to answer it. 

This is one of the problems inherent with generative AI. It looks exciting because it can answer ANY question. But it can also answer ANY question with ANY answer if left unchecked. 

Implementing guardrails

To combat hallucinations, you can start to build in guardrails to ensure that it only generates answers from certain sources. 

But you need to go further. Mostly Generative AI tools will try and answer any question, so you need to be able to teach them to only answer certain questions. The reason is that some customers playing around can quickly make it go off topic. 

Take the now infamous example from DPD. In that case, a frustrated customer was able to make the bot say very insulting things about DPD. Not only is that a terrible look for a brand, but if you are using an existing LLM, then you may be paying for every insult it generates. 

Getting your AI to simply refuse to answer certain questions can go a long way to solving this problem. But of course, at that stage you need to be sure that your bot can answer enough questions to make it worthwhile. A bot that barely does anything isn’t worth much. 

Having a fallback for failure 

Once these guardrails are in place, it’s still essential that your customers have somewhere to go if the AI is unable to answer the question.

Put another way, if your AI says “Sorry but I am unable to answer that question”, what does it do next? You may want to ask the customer to rephrase the question or ask a different question, but if that creates a dead end, what happens next?

One of the worst things about chatbots is when you get stuck in a loop and there is no way out. You do not want to create that situation for your customers. 

The fallback needs to be that customers can then be passed to a human agent. Creating a smooth handover then becomes essential. Being able to bring an agent into a chat window, or have them pick up the email thread is the ideal state. But you can also ask customers to leave an email address that an agent picks up later. 

There are other situations where if a certain question is asked, you would want your agents to jump in. If you have in-house stylists or style advisors, then questions about which product is best would be somewhere you want a human to give advice. 

We work with pet food brands, and they have to deal with situations where the customer’s pet has passed away. In those situations, many brands feel that it’s better for the customer if a human agent responds. 

Part 2: What do you want it to do?

A lot of the situations we’ve mapped out are about answering questions related to things like returns policies, product information and other public-facing information. In short, a lot of the answers it would be giving could be found somewhere on your website.

In that case, all you are doing is making it easier for customers to find information. That’s no bad thing(!), but is it using the technology to its full capability? 

Think about how many of the answers your team currently gives are essentially just copying-and-pasting information from these places. Is it a large percentage? Is it large compared to answers that are specific to that customer or that order? 

This is where AI can have a bigger impact on your customer service – by getting to the bottom of these questions.

Redefining conversation design

When considering implementing a traditional chatbot, a lot of people would look at the amount of conversation design involved and think “Wow that looks like a lot of work.”

Obviously this is where generative AI can help, by providing multiple different response options to the same question, or by helping the conversation designer find a good, succinct response to any question. 

Where the real value comes is in letting generative AI create uniquely-phrased responses to repetitive questions, which keeps conversations fresh. 

Again, you don’t want the AI to run riot, because the tone is very important. It still requires some quality control to ensure that the responses are up to scratch, fit the brand tone, and respond appropriately to customers depending on what they ask. Some questions can be dealt with more playfully, others need to be taken super seriously. 

But some things need to be more strictly phrased. Take anything legal. You will want very carefully worded responses, and you do not want your AI to start rephrasing legal language. 

In these cases you can use a blend of generative and templated responses, where the legal language is fixed, but the responses around it can be produced generatively. 

A relatively simple way this can be done through chat is triggering two consecutive messages, where the first is generative, and the second is template. Here’s an example of that: 

The first is a generative response responding to the context given, while the second is part of a templated flow. Which brings us neatly on to flows. 

Creating solid process flows

Workflows or process flows are needed when the conversation the AI is responding to needs to go anything less than surface level. 

We are all familiar with chatbots which are little more than glorified search engines: the ones that look at the keywords in the query and return an article that most closely matches what it sees. For these sorts of chatbots, there isn’t much of a flow needed because it’s essentially call and response. 

But as soon as you want to initiate an actual conversation with generative AI (or any type of chatbot) you need to have a process flow. This tells the AI what actions to take, and can incorporate any guardrails or limits you have imposed. 

Let’s take a couple of examples. First of all, imagine that the customer has asked about shipping costs.

The process flow might tell the AI to look at your FAQ page that details all the shipping costs, find and then rephrase the answer back to the customer. Simple, right?

But you need to know what country the person is in, because shipping costs are wildly different everywhere. So maybe you need to look up the IP address, or simply ask the customer. That adds a layer of complexity, and then depending on the answer, that’s the response your AI will give. 

Now, imagine that a customer wants to ask about the status of an order. A simple call and response bot would ask the customer to check their email for a tracking link, or point them to a tracking page on the website. But what if you want to provide a better experience by actually SOLVING the issue with generative AI. 

In that situation, a flow would look something more like this:

  • Ask the customer for their order ID number

  • Check the ID number against your order management and see if it has been dispatched

  • If yes, check with the carrier to see what the status currently is

  • Etc.

The “Etc.” above is doing a lot of work here. Of course the subsequent steps are highly dependent on what the status is. If it’s marked as “delivered” then you need to ask the customer to check for it. If it’s in transit then you need to check how long it’s been in transit, how long has it been at its current stage. If either of these is too long then you need to apologise and offer a solution.

All of this is possible with generative AI giving the responses, but it requires thought and taking a different approach to each stage.

Of course, one of the big obstacles is being able to give your AI access to your systems for it to be able to pull the relevant information. That’s where having deep integrations matter.

Using deep integrations to access the information

Imagine your customer service agents are responding to a customer question. What systems do they need to access to get to the bottom of each type of question? Those are the same ones you’ll need your AI to be able to access.

Think back to our Nike and ChatGPT example above. ChatGPT had no access to any systems, so it was making things up. If your AI does the same when a customer asks about a specific order or for help with a specific product then you are not helping your customers, you are hindering them. 

If you want your AI assistant to assist customers beyond answering questions about your products then you need integrations, and deep ones. If your AI cannot integrate with your systems then it will be unable to pull information specific to that customer, such as their order history, address, and anything else relevant. 

Without this information, then your generative AI assistant can just answer generic questions, such as about your policies, or your products. Those are good use cases, but it naturally creates a limit with your agents where it’s not able to answer the most common questions. 

Part 3: Should you build or buy? And what should you buy?

This blog has been trying to argue that using generative AI is really helpful to brands. It can turn otherwise templated and repetitive responses into more free-flowing and natural conversations. It can make a customer get a more personalised experience, and it can help customers to understand policies. 

But to make a generative AI really useful requires a lot of thought. It also requires a lot of technical expertise to train it, to keep it on track (flows), and to arm it with the information it needs to be useful (deep integrations), and then be able to hand over to an agent. You can build this sort of layer on top of an off-the-shelf LLM such as OpenAI’s, Meta’s, or any other, but it takes a lot of work to get right. 

So here’s a simple checklist:

You should look into building this yourself if:

  1. You have huge amounts of technical resources available

  2. The volume of customer service queries you get are large and mostly require very simple workflows

  3. You really want to

Otherwise the business case for investing your team’s time in this, over anything else, doesn’t really add up for most businesses. It’s likely going to be at least a year in the making, at best.

The buyer’s guide

The first thing you should do to build a business case for this is to define the scope of what you want your generative AI chatbot to do.

Is it a pre-purchase tool that summarises and rephrases answers about products and policies? Do you want it to act like a customer service agent? How do you want to manage the handover from the bot to your customer service team? What customer service queries would you like it to solve? 

In short, why are you doing it? 

The first thing we recommend retailers to do is to look at the customer service requests they are getting now, and use that to prioritise your requirements.

Once you know the scope, and what good looks like, you can build a business case for buying and start assessing suppliers.

If you’re a retailer, and you get over 3,000 customer service tickets a month, then talk to us and we can help you understand what’s possible, what it takes, and how effective it can be. We have experience of deploying AI for retailers, both big and small. Speak to us today