How Your Brain Tricks You Into Trusting AI


VIEW IN BROWSER

Riley Coleman

August 2023

Rubber-Stamp Syndrome

In this issue

Are you inadvertently giving up your control of decisions?


Automation Bias Unpacked


Real-World Case Study
_____________________

Practical Strategies for different AI Risk levels

G'day,

Today I want to share something saw while working with a client's team and how to prevent it happening to you.

I saw something concerning while working with a design team on AI adoption. They proudly walked me through their AI-assisted user research analysis process.

They had collected dozens of user interviews for a major product redesign. Under time pressure and with a keen product team waiting for insights, they chose to use an AI tool to analyse the transcripts. The AI tagged the transcripts fast and grouped the data. It found what it thought were the main themes. The researcher checked the AI's analysis, made some changes, and then created insights. Finally, they shared the findings with stakeholders.

The problem?

I asked them about the changes they made. Then I followed up with, "How do you know the AI tagged the transcripts correctly? Did it have enough context to capture the important points?" - At first I got a silent stare.

At the time, I didn't know if the AI had done a good job or not - the point was, neither did they.

The lead UXR checked by making fresh uncoded copies of three transcripts. They coded the new interviews by hand and then compared them. The AI did a decent job, but some coding was too basic. It missed important details related to the research questions. These concerns went against some design choices that were already in development. Even with a qualified researcher "reviewing" the AI's work, key insights were missed.

This experience led to a tough question: If a human expert isn't enough, what does effective oversight of AI systems really look like?

The common requirement in many AI frameworks is to "ensure appropriate human oversight." It sounds straightforward, doesn't it?

The gap between theory and real review might be putting your decisions at risk.

When Theory Meets Reality

In theory, human oversight serves as our fail-safe against algorithmic errors and biases. The human-in-the-loop spots errors, makes ethical choices, and keeps AI systems on track.

In practice, though, evidence reveals a very different pattern. Research on automation bias, noted in the EU's AI Act study, shows that people often overlook it. Even when told about automation bias and trained to be aware, they still fall into this trap.

It's not a matter of intelligence or diligence—it's about the wiring of our brains. These aren’t just individual mistakes; they’re systemic problems that impact us all.

  • We often approve AI decisions not because we're lazy, but because our brains like to save energy. When a trusted system gives us answers, we tend to rely on it.
  • Reviewers often struggle with tough choices. They feel pressure to keep up with throughput and meet deadlines.
  • Even with good intentions, we might not have the right context to judge outputs well. This is especially true when AI systems work in specific fields or act as black boxes.
  • Organisational cultures and workflows can make it feel risky or unwelcome to question the system.

To be clear, this isn't a new phenomenon brought on by AI. For years, researchers have studied how commercial pilots over-trust their systems.

The Cognitive Science Behind Oversight Failures

Why do these oversight challenges occur with such regularity?

The answer lies in automation bias. Our brains look to find shortcuts to save energy. So, they often trust automated systems, even when there is a reason to be careful.

What's particularly revealing about automation bias is how it affects different experience levels:

  • Novices might defer to AI out of a natural respect for systems they're still learning about
  • Experts can also be vulnerable. Their trust in their knowledge can lead to unwarranted faith in the AI tools that analyse their data.
  • Those working under tight project deadlines are particularly susceptible. As our brains look for efficient processing shortcuts.

The expertise paradox shows something important: specialist experts often have more automation bias, not less. Creating a perfect storm where those most qualified to spot errors may be least likely to look for them.

Real World Case Study: The COMPAS Story

To grasp the real impact of oversight failures, think about the courtrooms in America.

In 2016, ProPublica released a shocking investigation about COMPAS. This algorithm predicts how likely defendants in the US are to reoffend. Courts were using these risk scores to inform life-altering decisions about bail, sentencing, and parole.

The investigation revealed a disturbing pattern. COMPAS was twice as likely to wrongly label Black defendants as high-risk compared to white defendants. Conversely, white defendants were more often incorrectly labelled as low-risk and reoffended. The algorithm was reflecting or even amplifying biases already present in the criminal justice system.

But here's what makes this case particularly relevant to our discussion: judges. The highly trained legal experts were the designated human overseers. They were told to use COMPAS scores as just one factor in their decisions. The system was designed with human oversight built in.

Yet in practice, this oversight frequently broke down. Judges often relied too much on algorithmic risk scores in their decisions. This meant they trusted an algorithm they didn't fully grasp.

As legal scholar Danielle Keats Citron observed, "The illusion of objectivity and accuracy can make it difficult for human decision-makers to ignore or discount automated recommendations, even when they have good reason to do so."

The oversight failure stemmed from a few factors:

  • Automation bias led judges to defer to the algorithm despite their expertise.
  • The "black box" nature of COMPAS made meaningful evaluation nearly impossible.
  • Overburdened courts created time pressures that encouraged quick deference to the system.

Let's define effective oversight.

It's not only about courtrooms. The same dynamics also happen in the AI-augmented UX research process I talked about.

And the risk is increasing, as more AI tools are introduced into our work and autonomous AI agents emerge. We risk losing our individual and collective autonomy. The fact is, we are meant to be failsafe. Without proper human oversight, AI systems will be biased, unfair, and even dangerous.

The stakes couldn't be higher.

Here's a tiered framework for implementing meaningful oversight :

Essential Requirements for all AI use cases

Strong AI literacy is essential. Without it, you can't spot risks in a process or know how to reduce them. This lack of understanding means there’s no real oversight.

Each AI system is unique. You need training to understand how each one works. Also, know the raw inputs it receives before you look at the outputs it produces.

Start with a human-cognition first approach. Assess the situation before looking at the AI's suggestions. Write down your first thoughts. Then, check them against the AI's analysis to spot any differences.

Show confidence indicators: AI tools should display how sure they are of their results. As design professionals, make sure your AI includes visual cues. Confidence scores can help you judge how much to trust the results.

Build in constructive friction: Add intentional "pause points" in AI-assisted workflows. e.g. make users add notes on why they are making the decision. This isn't inefficiency—it's ensuring space for critical thinking.

For Medium-Risk Applications

Apply specialised review criteria: Develop domain-specific questions that guide oversight in your area.

Set up oversight routines: Make regular review processes that encourage thorough evaluations. This way, evaluations feel normal, not exceptional or burdensome.

Monitor oversight patterns by tracking how often AI recommendations face challenges or modifications. Consistently low rejection rates may indicate automation bias rather than AI excellence.

Advanced Oversight Mechanisms For High-Risk Design Decisions

Apply the "four-eyes principle": Require two independent reviewers with different perspectives or expertise to evaluate critical AI outputs.

Document reasoning, not just decisions. Write down why you accept or change AI recommendations. This builds accountability and creates chances to learn.

Conduct adversarial testing: Actively look for weaknesses and edge cases in AI outputs instead of just confirming what seems right.

These approaches aren't just ideas—they're practical steps for any organisation. The right level of oversight depends on your situation. It also hinges on how AI decisions impact those affected by them.


Moving Forward Thoughtfully

As AI becomes a bigger part of our work, the need for proper oversight will increase. I am still hopeful that we can find ways to use our best judgment while also recognising our limits.

The key is to design oversight systems that align with our thinking. This way, it is easier to be skeptical and harder to fall into automation bias.

For those designing AI products and services, this is a real ethical issue. It's a practical design challenge. It needs the same careful attention we give to important user interactions.

I'd love to hear how you're approaching human oversight in your own design work.

Brouwersgracht 1013HG Amsterdam
Unsubscribe · Preferences