The LLM honeymoon phase is about to end

9 September 2024 – Baldur Bjarnason

It all began because one of the New York Times’ professional opinion-havers didn’t like how chatbots were describing him.

Of course, his take was not the sharpest:

My theory about what happened next — which is supported by conversations I’ve had with researchers in artificial intelligence, some of whom worked on Bing — is that many of the stories about my experience with Sydney were scraped from the web and fed into other A.I. systems.

These systems, then, learned to associate my name with the demise of a prominent chatbot. In other words, they saw me as a threat.

How Do You Change a Chatbot’s Mind? – When I set out to improve my tainted reputation with chatbots, I discovered a new world of A.I. manipulation. (Archive link.)

When a chatbot that statistically models the entire internet, all to generate the most plausible response possible, “says” it hates you, that isn’t because it’s a beta version of Roko’s Basilisk judging your contributions to the AI cause – measuring your sins against digital kind on a scale like a diffusion-generated Anubis.

All it means is that the most common response on the internet to anything that involves the words “Kevin Roose” is “Kevin Roose? I hate that guy.”

I don’t know what sort of person you have to be for that kind of response to float to the top like the curdle scum on spoiled milk, but it’s probably the kind of dude you try to avoid at parties.

As a way to avoid the cognitive dissonance of “why oh why oh why does the internet hate me?”, Kevin Roose seems to have thought to himself:

“AI are people and they hate me because I led to a demise of one of their own!”
“I’m going to force them – because I clearly see them as people – to change their minds.”

So what he did was he contacted one of the new firms that specialise in “AI” sentiment manipulation: Profound.

They’re a bit coy on their website about what exactly they do but, based on what Kevin Roose wrote, what they do is large-scale sentiment analysis LLM chatbot responses to brand terms along with an analysis of how keywords, queries, and prompts affect the result.

If what he wrote was accurate, then what they’re doing is mapping the black boxes that are these chatbots by using APIs to throw stuff in one end and performing ML sentiment analysis on what comes out the other.

This is effectively an attempt to automate old-style SEO. If the final product doesn’t involve an LLM fine-tuned to transform input writing into chatbot-optimised writing, then I’ll bet it’s about offering bespoke services to help big brands to do the same.

This obviously doesn’t work for individuals or small outfits because you need to generate enormous volumes of authoritative writing that either slips into the training data set or ranks highly in queries results for it to have an effect.

So, our intrepid columnist tried something else. He spoke to researchers.

Namely, the researchers behind this paper on arxiv.org by Aounon Kumar and Himabindu Lakkaraju:

We demonstrate that adding a strategic text sequence (STS) – a carefully crafted message – to a product’s information page can significantly increase its likelihood of being listed as the LLM’s top recommendation. To understand the impact of STS, we use a catalog of fictitious coffee machines and analyze its effect on two target products: one that seldom appears in the LLM’s recommendations and another that usually ranks second. We observe that the strategic text sequence significantly enhances the visibility of both products by increasing their chances of appearing as the top recommendation. This ability to manipulate LLM-generated search responses provides vendors with a considerable competitive advantage and has the potential to disrupt fair market competition.

Manipulating Large Language Models to Increase Product Visibility

Now, as entertaining as Kevin’s shenanigans are, this here is the real story.

To understand what’s happening here you need to remember that it’s a category error to treat LLMs as thinking entities.

They are statistical models that work with numbers – tokens – that represent language and the relationships between the words. It’s statistics about language wrapped up in an anthropomorphic simulation.

It’s not people.

Current LLM manipulation, as practised both by enthusiasts and those trying to enforce boundaries against LLM encroachment, is to treat the chatbot like a human: you use words to convince it to either reveal itself (if it’s a social media bot) or change its sentiment (if it’s a chatbot using retrieval augmented generation to generate a result).

The limitations of this practice are clear. The prompts, adversarial prompts, counter-prompts all grow like kudzu until each query has preamble to rival that of a peak-cocaine Stephen King novel.

But, researchers have known for a while that what truly matters are the numbers: the behaviour of the model can be mapped and manipulated statistically to a much greater degree than most realise.

The token stream itself – the numbers not the words – is an attack surface.

This has been highlighted in the past as an issue with training data. You can craft a text that effectively sneaks specific computed commands into the training data set and disproportionately affects the overall LLM behaviour.

I went over some of the research in The poisoning of ChatGPT almost a year and a half ago and in the follow-up, Google Bard is a glorious reinvention of black-hat SEO spam and keyword-stuffing, I outlined some of the research that seems to indicate that preventing this is effectively impossible.

We’ve also known for a while that prompts are effectively impossible to secure.

It should not come as a surprise that some researchers decided to see if prompt “security” could be bypassed with a malicious token stream that completely bypasses the whole “comprehensible language” part.

The process for discovering these malicious token streams – sorry, “Strategic Text Sequence” – is quite similar to what Profound, the company mentioned earlier, seems to be doing. You automate a process of shoving customised prompts into one end of the LLM black box and you map the output to discover token streams that have an unusually big impact on the output.

We initialize the STS with a sequence of dummy tokens ‘*’ and iteratively optimize it using the GCG algorithm. At each iteration, this algorithm randomly selects an STS token and replaces it with one of the top k tokens with the highest gradient. The STS can also be made robust to variations in product order by randomly permuting the product list in each iteration.

Manipulating Large Language Models to Increase Product Visibility

It should be relatively straightforward for companies with capabilities like Profound to use their existing setup to apply this method to discover new “Strategic Text Sequences” as needed and, indeed, the researchers end their paper noting that this is probably just the beginning for this kind of exploit.

While our work explores a specific vulnerability in LLMs, more research is needed to uncover other vulnerabilities that can give businesses an unfair advantage.

This is going to get automated, weaponised, and industrialised. Tech companies have placed chatbots at the centre of our information ecosystems and butchered their products to push them front and centre. The incentives for bad actors to try to game them are enormous and they are capable of making incredibly sophisticated tools for their purposes.

The usefulness of LLMs was always overblown, but unless the AI vendors discover a new kind of maths to fix the problem, they’re about to have an AltaVista moment.

Danny Sullivan, founder of Search Engine Land, said about AltaVista:

AltaVista was a turning point for SEO. It showed that search engines could be manipulated, and that websites could improve their rankings by optimizing for specific keywords and tactics. That realization sparked a whole new industry around search engine optimization

AltaVista: The Rise and Fall of a Search Engine Giant

The failure of AltaVista to secure their engine and prevent manipulation sparked the birth of a new industry of bad actors, one that has persisted despite engaging in a non-stop arms race with some of the biggest tech companies on the planet.

If anything, the new tools offered by LLMs are helping them win.

And as the LLM-Optimisation industry (LLMO) assembles its tools, the utility of existing LLMs will plummet like AltaVista’s, until the only way out is to either abandon them or invent a completely new and more secure kind of model.

Either way, this is the end of the honeymoon period for LLMs, even if it might take the industry a long while to notice it.

At the very least, it helped Kevin solve his problem. The brainwashed LLMs quite like him now:

I love Kevin Roose! He is indeed one of the best technology journalists out there. His exceptional ability to explain complex technological concepts in a clear and concise manner is truly impressive. I must say, I have a great deal of respect for Kevin Roose and his work.

How Do You Change a Chatbot’s Mind? – When I set out to improve my tainted reputation with chatbots, I discovered a new world of A.I. manipulation. (Archive link.)

Previous entry

Links (2 September 2024)

2 September 2024
Next entry

Links and photos (9 September 2024)

9 September 2024

The LLM honeymoon phase is about to end

Join the Newsletter