As an AI Consultant, I get a front row seat to AI and innovation; I repeatedly see when it succeeds, and when it fails. I’ve been building AI and machine learning based systems for over 20 years, in many different companies, cultures, and problem domains. During this time, I’ve noticed a number of patterns that cause the deployment of AI systems and AI powered features to fail.
The AI system never progresses beyond contemplation
Unfortunately, if I had to author an organizational directive that exemplifies this problem it might be:
Think first, think again, and keep thinking… then some powerpoints, some meetings, and finally, resign yourselves to tackling something predictable and easier to put on a schedule than AI powered features.
Contemplation failures happen for multiple reasons. One reason is organizations often don’t know how to begin vetting an AI idea, perhaps due to lack of resources, occasionally due to analysis paralysis, sometimes due to the inability to prioritize features or projects with uncertainty. Each of these reasons in turn has its own set of causes.
Learn more about our Virtual Concierge Solution, the best way to enhance customer satisfaction for industries like hospitality, retail, manufacturing, and more.
Analysis paralysis is often the default stance for many cultures, often because companies do not sufficiently reward risk taking or intentionally bias towards action. Managers often optimize toward short term progress against KPIs; riskier features, which may offer dramatic improvements, though perhaps on a longer time horizon, are provided insufficient time and resources to blossom -- the compromise stance too often is to think about the idea, maybe write a document or deliver some presentations, or have a few meetings, but when decision time comes to commit serious resources, institutional appetite for risk can prove difficult to find.
2. Lack of Support
Leveraging AI to significantly improve the baseline of a business requires iteration. Deploying AI successfully often requires cooperation across multiple departments and persistent effort dedicated toward evaluating and improving the new system. Optimally gathering requisite training data to develop system intelligence is essential. In addition, reviewing AI system output through often manually taxing processes may be required for improved efficacy. Usually accommodations must be made in existing systems, perhaps to gather more adequate data or evaluate system output through A/B testing. Dependencies across departments and projects often meet organizational resistance, and far too often, companies fail to follow through. I’ve seen many promising AI developments end up shelved because departments failed to assign sufficient prioritization required for success.
3. Failure to commit to a process for continual improvement.
Teams often manage to apply a model to a problem, but they often fail to establish a long term plan for regularly improving results. For example, a team might not commit to continuous efficacy assessment and monitoring in case a model malperforms in the field. I’ve also seen teams draw the circle too small, where continual improvement is happening, but model efficacy from a customer’s perspective is not included. For example, data scientists, often out of frustration from an inability to get product level feedback instrumented and deployed, may rely on limited efficacy assessments, like arbitrarily selected minimum precision/recall error rates, before releasing a model.
4. Focusing too heavily on models and not data
In many cases teams, while unsatisfied with results, often presume a better algorithm (usually more complex) is the problem. Their algorithm is often indeed suboptimal, but even with a state of the art model, results would likely remain poor; in this case the key is usually to revisit data sources, and look for more feature candidates and better representation of that data in the AI model. In fact, I will go out on a limb and say your time is most often better spent on feeding your models better data than fine tuning. The models themselves, through their failures, will help you understand which additional data they need to improve. There are many reasons why excessive model focus occurs, but one I often encounter is data scientist excitement for trying out a new, or state of the art model, and relative disinterest, and detachment, from improving training data.
5. No signal
AI projects often start with an intuition that valuable insights might be predicted from data. In many cases this intuition is correct, and a quick initial validation given available data can show there indeed is measurable predictability above chance. Sometimes, however, the data just isn't there to make adequate predictions towards a given target; it is far better to discover this early in a project before a great deal of resources are wasted. For this reason, we begin every AI project with an exploratory phase evaluating the problem space, available data sources, the feasibility of formulating a valuable AI problem, and finally, the presence of actual predictive signal. The outcome of this phase should often transcend a validation of existing data sources, and include a thorough examination of additional data sources, which if tapped, might boost signal. For example, when attempting to predict customer behavior, like the probability of leaving a service, leveraging only past purchase history might have weak signal; if however, we combine this data w/ US census data by zip code, we might find including additional educational, income, health and demographic related feature variables gets us to strong predictive signal for customer lapse.
6. AI errors are too exposed
AI systems inevitably have errors. On very rare occasions, these can be eliminated, but typically we need to think of addressing them on a timeline, where easy errors can be addressed before an MVP, straight forward errors can be minimized over time, and some challenges will remain more difficult with no promise they ever get fixed. A model I’ve seen work well, is a human-in-the-middle approach where humans provide the service, but over time, AI models perform more heavy lifting. For example, rather than starting on day one with a fully automated virtual therapist, cardiologist, or tech support person, where the AI system must field all questions and interactions, we start with a cleverly structured service enabling the AI to perform more heavy lifting over time, but anchored early on in human provided expertise; knowledge from the initially more human powered service is regularly captured and transformed into training data powering AI models of increasing efficacy over time. As time passes, the humans are able to service larger user populations due to efficiencies gained from additional model contribution; humans in turn are able to specialize in the more interesting and complex problems. Sometimes placing a human in the middle is not appropriate, in these cases, there are often other ways to “hide” errors. For example, a common tactic is to present higher model confidence cases up front similar to the way a search engine leads with its best results.
7. Excessive bias towards a core competency
Companies are seeded with different DNA and evolve in different ways. Most companies have a core competency that often originates with the founders. For example, I recently worked with a healthcare company that had a very strong competency in devops, likely due to their ability to navigate complex HIPAA regulations while still excelling at their SAAS service. This core competency, however, did not mean data science came easily. Whenever we had a meeting on any data science topic, for example, while presenting model efficacy results, the conversation inevitably turned toward operational discussions. This wouldn’t necessarily be bad, if it weren’t for the fact that we had a lot of non operational issues to discuss and gain agreement on.
8. Insufficient team strength or suboptimal structure
How teams are structured, and where natural strengths lie can also lead to AI failures. For example, I occasionally see cross functional teams with complete autonomy. While rare, these teams have all the skills and structure they need to succeed. They typically have a product manager that can move a new AI feature to the top of the priority list without any need to confer outside the team. The team will also have the technical skills, either in house or complemented by consultants for all parts of the stack from UX, to back end, to of course, data science. These teams are at a major advantage in terms of successfully moving AI powered features into product efficiently and regularly. Teams lacking this natural structure, can sometimes overcome this disadvantage by borrowing resources and temporarily assembling the requisite capabilities, but, we see this approach fail quite often; when these teams fail, it is often because some dependent resource failed to do their part.
9. Excessive unaddressed technical debt
More long lived projects tend to grow barnacles that aren’t always dealt with regularly. This can include things like: broken test harnesses, large parts of a code base (lightly maintained) servicing stale features, and of course that core module that no engineer wants to touch. When technical debt is not regularly addressed, and teams lose their ability to make significant changes quickly and confidently, AI features suffer.
10. Failure to leverage business playbook wins
Not everything in a business is automated. For example, sales and marketing departments often evolve a playbook that relies heavily on a company’s tribal knowledge; this playbook can, for example, target a certain type of customer a certain way with tailored messaging. Often, I see AI teams try to completely automate a task, like predicting the likelihood a customer will lapse, or predicting how long they will stay. In these scenarios, a common decision is to try to incorporate automatically generated predictions into a dashboard. Often, few look at the dashboard. If, however, a lower friction interim step of educating these internal users, and working with them to contribute to tribal knowledge, and affect their go to playbook is incorporated, potential dashboard users can gain confidence in model lessons and become a part of the process.