Posts / ai

The AI Detector Racket Is Failing Real Students

Someone posted recently about nearly failing a college course because an AI detector flagged their entirely human-written paper. Seven pages, ten citations, written over several days. One sentence got flagged because it started with the word “studies.”

I’ve been sitting with that for a bit, because it’s a genuinely awful situation that’s going to keep happening to more people.

The core problem is this: AI detectors are statistical tools dressed up as evidence. They measure how predictable a piece of text is, based on patterns from training data. Clear, formal, well-structured academic writing happens to look a lot like AI output, because AI was trained to imitate exactly that. So the better you write, the more suspicious you look. That’s not a minor flaw. That’s the mechanism working exactly as designed, producing exactly the wrong outcome.

Someone in the discussion made an analogy to witch-finding that I can’t shake. It’s a bit dramatic, but not entirely wrong. There’s a real social need to be seen to be doing something about AI-assisted cheating, and these tools give administrators the appearance of a solution without the inconvenience of actually having one. The market has responded accordingly. A new AI detector launches somewhere every other week, most of them with disclaimers buried in their terms of service that basically say: don’t actually rely on this.

One person shared that they’d run a paper they wrote in 2005, about hospital-acquired infections, through several of these tools. It came back as 95% likely AI-generated. Another ran a chapter from 2015 through five checkers and got scores between 50% and 84%. This isn’t edge case behaviour. It’s the norm.

The more interesting thread in the discussion was about revision history as a defence. Google Docs and Word both log your edits over time, which in theory shows a paper being built gradually rather than appearing fully-formed in a single paste. That’s a reasonable idea. The problem is that people are already building tools to fake convincing revision histories, including software that simulates realistic human typing with natural pauses and corrections. The arms race is real, and it’s moving fast. Revision history might buy you six months of credibility before it stops meaning anything.

What’s left is oral examination. If you can walk a lecturer through your own argument, explain your reasoning, talk about why you chose those particular sources, that’s something a person who actually wrote the paper can do and someone who just pasted in AI output generally can’t. It’s not scalable across a class of two hundred students, but it is actually meaningful evidence. The discussion thread had a lecturer who’d already arrived at this conclusion on his own.

There’s also a legal dimension that surprised me. At least in the US, submitting a student’s paper to a third-party AI checker without consent may violate the student’s intellectual property rights, depending on how those services handle the data. Several universities have explicitly banned the use of these tools for that reason. Whether Australian institutions are thinking about this yet, I genuinely don’t know.

The student in this case wrote their paper on a Samsung tablet and then copy-pasted it to finish on a computer. No revision history. That’s a real problem, not because they did anything wrong, but because the system they’re operating in has decided that process documentation is now part of the proof of authorship. Nobody told them that when they enrolled.

That’s the part that actually bothers me. The rules changed mid-game, the tools being used to enforce them are unreliable by the vendors’ own admission, and individual students are absorbing the consequences of an institutional failure to think this through properly.

The best practical advice from the thread was blunt and correct: find papers or announcements written by the people judging you, run them through the same detector, and show them the results. Not to embarrass anyone, but to make the unreliability of the tool impossible to ignore. It works because the problem is universal. The detector doesn’t care who wrote the text.

I don’t have a clean resolution to offer here. The technology for generating convincing text and the technology for detecting it are locked in a race that the detectors are structurally unlikely to win, because the generators have vastly more money and better engineers and a clearer objective. Oral exams and in-person assessment are probably where this ends up, at least for high-stakes work. Getting there is going to be messy, and some students are going to get caught in that mess unfairly.

That’s a real cost, and it’s being paid by the people who can least afford it.