We’re Not Done Yet: After Developing the AI

Developing AI is a dynamic, multifaceted process. Even if an AI performs optimally from a technical standpoint, other constraining factors could limit its overall performance and acceptance. Developing an AI to be safe and dependable means stakeholders must learn more about how the AI functions as the risks from its use increase. This section details factors that make that understanding challenging to achieve, and describes how proper documentation, explanations of intent, and user education can improve outcomes.

 

 
Explore the Three Fails in This Category:

Testing in the Wild

Test and evaluation (T&E) teams work with algorithm developers to outline criteria for quality control, and of course they can’t anticipate all algorithmic outcomes. But the consequences (and even blame) for the unexpected results are sometimes transferred onto groups who are unaware of these limitations or have not consented to being test subjects.

 

Examples

Boeing initially blamed foreign pilots for the 737 MAX crashes, even though a sensor malfunction, faulty software, lack of pilot training, making a safety feature an optional purchase, and not mentioning the software in the pilot manual were all contributory causes.1

In 2014, UK immigration rules required some foreigners to pass an English proficiency test. A voice recognition system was used as part of the exam to detect fraud (e.g., if an applicant took the test multiple times under different names, or if a native speaker took the oral test posing as the applicant). But because the government did not understand how high the algorithm’s error rate was, and each flagged recording was checked by undertrained employees, the UK cancelled thousands of visas and deported people in error.2,3 Thus, applicants who had followed the rules suffered the consequences of the shortcomings in the algorithm.

Why is this a fail?

T&E of AI algorithms is hard. Even for AI models that aren’t entirely black boxes we have only limited T&E tools4,5 (though resources are emerging6,7,8). Difficulties for T&E result from:

Uncertain outcomes: Many AI models are complex, not fully explainable, and potentially non-linear (meaning they behave in unexpected ways in response to unexpected inputs), and we don’t have great tools to help us understand their decisions and limitations.9,10,11

Model drift: Due to changes in data, the environment, or people’s behavior an AI’s performance will drift, or become outdated, over time.12,13

Unanticipated use: Because AI interacts with people who probably do not share our skills or understanding of the system, and who may not share our goals, the AI will be used in unanticipated ways.

Pressures to move quickly: There is a tension between resolving to develop and deploy automated products quickly and taking time to test, understand, and address the limitations of those products.14

Because all these difficulties, deployers and consumers of AI models often don’t know the range or severity of consequences of the AI’s application.15

Jonathan Zittrain, Harvard Law School professor, describes how the issues that emerge from an unpredictable system will become problematic as the number of systems increases. He introduces the concept of “intellectual debt,” which applies to many fields, not only AI. For example, in medicine some drugs are approved for wide use even when “no one knows exactly how they work,”16 but they may still have value. If the unknowns were limited to only a single AI (or drug), then causes and effects might be isolated and mitigated. But as the number of AIs and their interactions with humans grows, performing the number of tests required to uncover potential consequences becomes logistically impossible.

 

What happens when things fail?

Users are held responsible for bad AI outcomes even if those outcomes aren’t entirely (or at all) their fault. A lack of laws defining accountability and responsibility for AI means that it is too easy to blame the AI victim when something goes wrong. The default assumption in semi-autonomous vehicle crashes, as in the Boeing 737 MAX tragedies, has been that drivers are solely at fault.17,18,19,20,21 Similarly, reports on the 737 crashes showed that “all the risk [was put] on the pilot, who would be expected to know what to do within seconds if a system he didn’t know existed… forced the plane downward.”22 The early days of automated flying demonstrated that educating pilots about the automation capabilities and how to act as a member of a human-machine team reduced the number of crashes significantly.23,24,25

As a separate concern, the individuals or communities subject to an AI can become unwilling or unknowing test subjects. Pedestrians can unknowingly be injured by still-learning, semi-autonomous vehicles;26 oncology patients can be diagnosed by an experimental IBM Watson (Watson is in a trial phase and not yet approved for clinical use);27 Pearson can offer different messaging to different students as an experiment in gauging student engagement.28 As the AI Now Institute at New York University (a research institute dedicated to understanding the social implications of AI technologies) puts it, “this is a repeated pattern when market dominance and profits are valued over safety, transparency, and assurance.”29

The early days of automated flying demonstrated that educating pilots about the automation capabilities and how to act as a member of a human-machine team reduced the number of crashes significantly

 

 

Hold AI to a Higher Standard Involve the Communities Affected by the AI Make Our Assumptions Explicit Monitor the AI’s Impact and Establish Layers of Accountability
It’s OK to Say No to Automation Plan to Fail Try Human-AI Couples Counseling Envision Safeguards for AI Advocates
AI Challenges are Multidisciplinary, so They Require a Multidisciplinary Team Ask for Help: Hire a Villain Offer the User Choices Require Objective, Third-party Verification and Validation 
Incorporate Privacy, Civil Liberties, and Security from the Beginning Use Math to Reduce Bad Outcomes Caused by Math Promote Better Adoption through Gameplay Entrust Sector-specific Agencies to Establish AI Standards for Their Domains 

Add Your Experience! This site should be a community resource and would benefit from your examples and voices. You can write to us by clicking here.