The poisoning of ChatGPT

8 May 2023 – Baldur Bjarnason

This page was originally published elsewhere, but is republished here for archival purposes.

OpenAI’s secrecy and Artificial Generative Intelligence ambitions might leave the company’s products vulnerable to a new form of black-hat keyword manipulation.

Secrecy is the default for big AI #

If there’s one thing that unites the biggest players in the AI industry, it’s secrecy.

Microsoft, Google, and OpenAI:

Refuse to publicly document the training data sets they use.
Are secretive about what exact processes and mechanisms they use for fine-tuning.
Refuse to give impartial researchers the access to their models and training data that’s needed to reliably replicate research and studies.

I go over all of these issues, backed with references, in my book, in a chapter I’ve made available online: Beware of AI pseudoscience and snake oil.

Even if we assume that the tendency towards pseudoscience and poor research isn’t inherent to the culture of AI research and just take for granted that, in a burst of enlightened self-awareness, the entire industry is going to spontaneously fall out of love with nonsense ideas and hyperbolic claims, the secrecy should still bother us.

This much secrecy—or, information asymmetry—is a fatal blow to a free market.

The most obvious application of this is in second-hand markets, which were George Akerlof’s subject in his paper The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism. When sellers are allowed to withhold information about the goods, they will selectively hold back information about defects which causes the price difference between defective and non-defective goods to evaporate.

The problem, in market terms, is that if buyers don’t have enough information to tell the two groups of goods apart—Akerlof worked with the used car market as a test case—they will price all goods as if they were potentially defective. The seller of the defective car gets more for it than if they were upfront—especially at the beginning before market dynamics kick in—but the seller of the non-defective car gets less. This drives the good sellers out of the market.

AI language models aren’t used goods. We don’t buy and sell language models on Etsy. But information asymmetry is still an issue because, as with the “lemon” cars, we have no way of telling a defective good from a non-defective one.

Except this time, the defects are security vulnerabilities and it looks like they are all quite defective.

Language and diffusion models can be poisoned #

We’ve known for a long time that AI models can be “poisoned”. If you can get an AI vendor to include a few tailored toxic entries—you don’t seem to need that many, even for a large model—the attacker can affect outcomes generated by the system as a whole.

The attacks apply to seemingly every modern type of AI model. They don’t seem to require any special knowledge about the internals of the system—black box attacks have been demonstrated to work on a number of occasions—which means that OpenAI’s secrecy is of no help. They seem to be able to target specific keywords for manipulation. That manipulation can be a change in sentiment (always positive or always negative), meaning (forced mistranslations), or quality (degraded output for that keyword). The keyword doesn’t have to be mentioned in the toxic entries. Systems built on federated learning seem to be as vulnerable as the rest.

Given that Microsoft and Google are positioning these systems as the future of search, this vulnerability is a black-hat SEO’s wet dream. Just by getting the system to index a few hundred, maybe even a few dozen, pages, you might be able to hijack an entire keyword? For a group that habitually manufactures spam blogs and SEO-manipulation content to bump results up a few notches in a search engine result, the incentives should be irresistible.

Until recently, this wasn’t seen to be that big of an issue because most people assumed both that OpenAI and Google have sensible processes to prevent obvious attacks, like bad actors taking over expired domains, and that the 2021 cut-off point for OpenAI’s training data set automatically prevented new attacks from being included.

Of course, the first assumption, that of competence, is unsafe to begin with as we have no reason to believe that OpenAI knows the first thing about security.

We’ve just found out that the second assumption, that the cut-off point prevented new attacks, was also unsafe.

Turns out that language models can also be poisoned during fine-tuning.

Poisoning Language Models During Instruction Tuning

The researchers managed to do both keyword manipulation and degrade output with as few as a hundred toxic entries, and they discover that large models are less stable and more vulnerable to poisoning. They also discovered that preventing these attacks is extremely difficult, if not realistically impossible.

That fine-tuning is vulnerable to poisoning is concerning because fine-tuning foundation models with user data seems to be a common strategy for integrating language models into enterprise software and custom services.

But it should also worry anybody who uses OpenAI services, because up until a couple of months ago, OpenAI was using end-user prompts to fine-tune their models.

Given that we’ve known about model poisoning for years, and given the strong incentives the black-hat SEO crowd has to manipulate results, it’s entirely possible that bad actors have been poisoning ChatGPT for months. We don’t know because OpenAI doesn’t talk about their processes, how they validate the prompts they use for training, how they vet their training data set, or how they fine-tune ChatGPT. Their secrecy means we don’t know if ChatGPT has been safely managed.

They’ll also have to update their training data set at some point. They can’t leave their models stuck in 2021 forever.

Once they do update it, we only have their word—pinky-swear promises—that they’ve done a good enough job of filtering out keyword manipulations and other training data attacks, something that the AI researcher El Mahdi El Mhamdi posited is mathematically impossible in a paper he worked on while he was at Google.

This means that OpenAI and ChatGPT as a product is overpriced. We don’t know if their products have serious defects or not. It means that OpenAI, as an organisation, is probably overvalued by investors.

The only rational option the rest of us have is to price them as if their products are defective and manipulated.

We should be looking for alternatives.

But more on that another time.

Read The Intelligence Illusion for more like this #

What are the major business risks to avoid with generative AI? How do you avoid having it blow up in your face? Is that even possible?

The Intelligence Illusion is an exhaustively researched guide to the risks of language and diffusion models.

Previous entry

My writing on AI; the story so far

7 May 2023
Next entry

Poisonings, Corporations, and other links

8 May 2023

The poisoning of ChatGPT

Secrecy is the default for big AI #

Language and diffusion models can be poisoned #

Read The Intelligence Illusion for more like this #

Join the Newsletter