AI summaries are almost certainly unreliable
It’s like the past few years’ discussions on bias in training data and issues with shortcut learning in AI models just didn’t happen at all?
Like, our industry didn’t take in any of it, did it?
I thought we’d spent the past few years digging into issues and looking for ways to use these systems on problems they can handle, like data and media conversion and modification.
But, no, we’re going straight for the dystopian nightmare stuff:
- Knowledgebases that straight up lie.
- Creating an underclass that’s only served by mediocre AIs instead of actual lawyers or doctors
- Summarisers that hallucinate in their summaries
- Decision-making syststraight-upth shortcuts and biases
And the AI companies are behaving like cults:
- Withhold information on their work to prevent the singularity
- Theatrical “threat” testing to demonstrate the imminent arrival of a mechanical Yaldabaoth—a demiurge of a new world threatening humanity
- Rules for disseminating knowledge about the models that aren’t even remotely scientific. It’s just insinuations and secrecy
- Claims about the imminent arrival of the end times if we don’t follow their rules and ideology
This is all so fucked up.
I’m absolutely serious when I’m saying that all of this is starting to look like the mainstreaming of a cult.
And I’m not the only one to make this observation. Timnit Gebru has been pointing this out for a while. And Émile P. Torres goes into more detail on this over on the birdsite. This definitely smells like a cult to me.
(They are working on a paper on this, which I’m dreading/looking forward to reading in abject horror at the state of our world.)
My “summarisers that hallucinate” comment from earlier in this thread is based on the fact that a running thread through a lot of research over the past few years is the worry that hallucinations were a big problem for summaries generated by AI models.
See:
- Faithful to the Original: Fact Aware Neural Abstractive Summarization (2017)
- On Faithfulness and Factuality in Abstractive Summarization (2020)
- Evaluating the Factual Consistency of Abstractive Text Summarization (2020)
- Entity-level Factual Consistency of Abstractive Text Summarization (2021)
- How Far are We from Robust Long Abstractive Summarization? (2022)
- Towards Improving Faithfulness in Abstractive Summarization (2022)
The trend in these papers is that hallucinations are an emergent property that seems to increase, not decrease as the models grow in size and happens just as often in summaries as it does in regular responses. Considering how prevalent hallucinations seem to be for people testing ChatGPT, Bing Chat, and GPT-4, it seems extremely unsafe to assume that using them to generate summaries will lead to accurate results.
The tech industry’s reaction?
😜
Who cares if AIs lie in their summaries? Nobody, that’s who!
None of these papers on hallucinations and factual consistency in AI-generated summaries are testing OpenAI’s super-secret-private chat/AI thingamabobs
That’s because it’s impossible to verify their claims due to their secrecy. Not that they make any solid claims. They just say that the current model is X% better at avoiding hallucinations than the previous one, which was also super-secret-private so we don’t have any meaningful statistics on their reliability either.
And according to the almost 100 page PDF they posted about GPT-4, their approach to reducing hallucinations consists of teaching it *manually *that some facts are facts. That’s not an approach that scales when the long tail of falsehoods is infinite. It is, however, an approach that does wonders when your primary audience are journalists who don’t have the time to do thorough testing and gullible tech pundits.
It’s all just theatrics.
For more of my writing on AI, check out my book The Intelligence Illusion: a practical guide to the business risks of Generative AI.