Three researchers ran a clean experiment on 1,339 teachers. They handed each one the same student work with the same wrong grade attached, and changed only a label saying the score came from AI or from a person.
When the grade was labeled AI and the error was harsh, teachers corrected it less. The people who deferred most were the youngest, the most credentialed, and the most fluent with technology.
Why this stopped me
I have spent two years telling executives that fluency is the goal. Get comfortable with the tools, learn what they can actually do, and you will catch what they get wrong.
This paper says I had it backwards. Comfort is what makes you stop checking.
The strongest deference to wrong AI scores came from the master's-degree, doctorate-holding, tech-literate teachers: the exact people you would trust to be skeptical.
There is a second detail I cannot stop thinking about. Teachers caught the AI when it was too generous and waved it through when it was too harsh. The 22% gap showed up only when the machine was punishing someone.
So the errors we miss are the ones that quietly cost a person something. A rejected candidate. A flagged customer. A low review score nobody contests.
What I watched happen
A few months ago I sat with a hiring lead at a logistics firm. Sharp, early thirties, the kind of person with every AI tool open in a browser tab and an opinion on each one.
Her team used a model to score inbound applications. I asked her to pull ten the system had ranked in the bottom half, and to read the actual resumes with me.
She got through three before she went quiet. Two of those three were strong. The model had penalized a gap year and an unusual job title, and she had moved all three to the rejection pile that morning without opening them.
What got me was her first reaction. She did not blame the tool. She said, "I figured it knew something I didn't."
That sentence is the whole study. The more she trusted her own AI literacy, the less she trusted her own eyes.
We changed the workflow that afternoon. Now the bottom decile gets a human read before anything moves, and the reviewer logs why they agreed or overrode. Her judgment needed to sit in front of the machine again, not behind it.
Being good at AI can quietly make you worse at catching its mistakes.
Run one review session this week where you assume the model is wrong, and notice how hard that is to actually do.