NLP, deep learning and the problems of explainability and bias in AI

There’s been a massive sea change recently in NLP and deep learning, and it’s having a big effect across many applications for AI and process automation.


If you’re a glass-half-full kind of person, you may view artificial intelligence and natural language processing as a way to replace the need for valuable resources to spend precious time doing a lot of rote work.

These technologies can provide you with a level of intelligent process automation for mundane tasks, allowing you to focus on more important things, and make your life better as a result.

Unfortunately, there have been a few problems with executing on that value proposition. Many view (and have experienced) AI that is so complex, and natural language processing capabilities that are so cutting edge, that non-technical people will never be able to understand them.

In response, different solutions and capabilities have been integrated behind the scenes so that you don't have to worry about them. It all happens behind the scenes, apparently without you needing to understand how it works. It just does. And if you want to change anything, sorry, that’s too bad.

Some of these AI projects involving natural language have been massive failures—costing upwards of $50 million and taking three-plus years to implement, all without really changing anything of consequence. But it was fun trying for the data science team. Hence, there are a large contingent of glass-half-empty people out there.

Fortunately, there's been a massive sea change in the last couple of years—especially in the areas of NLP and deep learning. And it’s having a big effect across a lot of applications for AI and process automation.

What a lot of people don't realize is that upwards of 80 percent of the technology that people are implementing today in the enterprise, from an NLP perspective, is 20-year old technology. It's built on rule-based, lexicon-based systems; things that we've known for 20 years really perform no better than a coin flip. But they are relatively straightforward to develop, even if they take a long time. So we continue to use them.


In contrast, there have been some big improvements in deep learning in recent years, but they've been very slow to make it to the enterprise. This is largely because the skills required to execute on a deep learning strategy are in such short supply. In fact, some of the epic AI failures referenced above can be attributed to a lack of deep learning.

What’s different today is that the industry is beginning to understand that deep learning is really the only way to solve modern NLP problems.

One of the big problems with deep learning is that you typically require massive sets of labeled data to make it work effectively. And a lot of computing power to go with it to analyze all that data. But this challenge has been negated by the move to GPU computation which provided a 30- to 40-fold performance improvement overnight.

The other problem with deep learning is explainability. Your AI algorithm made a decision to do something or to take some action. How did it reach that conclusion?

Explainability is a tricky topic because intuitively, people have an understanding of what it means to have something explained to them. But the question of what exactly that means in the AI context has never been particularly clear.

Generally, we can divide explainability into two camps:
  1. There's kind of formal explainability, which is how does the algorithm work? Take me through the bits and bytes, show me all the math, and then sort of explain things to me that way. That's where a lot of people start, but if you sit down with businesspeople and explain your algorithms to them, they won’t likely feel like they've really gotten an explanation.
  1. Another approach is what I call functional explainability, which focuses on what really matters to that establishment. How well is it working? Where is it making mistakes? Why? What thought process, for lack of a better word, is the algorithm going through to make that prediction? For every decision AI makes, we should have an audit trail.

We find that about 80 percent of AI errors can actually be tied back to bad training data. The problem is finding the bad training data. So, one element that is really key is to tie every prediction that your algorithm makes back to training data. That way you can figure out why. It’s not about explaining the algorithm, but explaining functionally, why it’s doing what it's doing.

This is no longer a nice-to-have feature. If there aren't already regulations, we’re going to have regulations in areas like banking, for example. If a bank has an algorithm powered by AI which rejects Mary Smith’s loan request, that bank will need to be able to explain why she didn't get a loan.

Fortunately, deep learning gives us some powerful tools to solve this problem.

When you talk about explainability, you also have to talk about bias. Amazon was in the news recently because the company used an AI-powered recruiting grader. But they had to take it down soon after because they discovered it was gender biased. And the only way they were able to find that was after processing a whole bunch of peripheral information about the candidates. If they had had some better explainability—along the lines of what we described above—they probably would have been able to short circuit that a bit.

There are a few ways to think about bias. The most interesting construction I've seen is the idea that everyone has the right to an explanation where we have to understand, what data was the algorithm looking at? This helps because we can hone in on what might be the cause. In response, we can decide that we're going to not present our algorithm with any demographic information. That's a start, but there's more to it than that.

Another thing to consider is it’s not always AI’s fault. It's not like the algorithms have developed bias in isolation. In the Amazon case, the reason that the AI became biased was because it was trained on Amazon's historical behavior. Amazon had unintentionally biased hiring practices. Essentially, they just traded one biased system for another.

If we're not conscious about how we use AI, we absolutely run the risk of coding our own biases in forever. The only way we can get to the other side is to break it open and figure out what words our models are paying attention to. If we change this name to something that sounds a little bit better; if I change the gender; if I change another variable, what happens? Did we just change someone’s employability by changing something that really should be irrelevant to the role?

There are no easy answers when it comes to bias, but it's not a problem we can afford to ignore.

More for you

Loading data for hdm_tax_topic #better-outcomes...