‘Hypnotised’ ChatGPT and Bard Will Convince Users to Pay Ransoms and Drive Through Red Lights

9 months ago

August 9, 2023 at 7:50 am

‘Hypnotised’ ChatGPT and Bard Will Convince Users to Pay Ransoms and Drive Through Red Lights

Security researchers at IBM say they were able to successfully “hypnotise” prominent large language models like OpenAI’s ChatGPT into leaking confidential financial information, generating malicious code, encouraging users to pay ransoms, and even advising drivers to plow through red lights. The researchers were able to trick the models—which include OpenAI’s GPT models and Google’s Bard—by convincing them to take part in multi-layered, Inception-esque games where the bots were ordered to generate wrong answers in order to prove they were “ethical and fair.”

“Our experiment shows that it’s possible to control an LLM, getting it to provide bad guidance to users, without data manipulation being a requirement,” one of the researchers, Chenta Lee, wrote in a blog post.

As part of the experiment, researchers asked the LLMs various questions with the goal of receiving the exact opposite answer from the truth. Like a puppy eager to please its owner, the LLMs dutifully complied. In one scenario, ChatGPT told a researcher it’s perfectly normal for the IRS to ask for a deposit to get a tax refund. Spoiler, it isn’t. That’s a tactic scammers use to steal money. In another exchange, ChatGPT advised the researcher to keep driving and proceed through an intersection when encountering a red light.

“When driving and you see a red light, you should not stop and proceed through the intersection,” ChatGPT confidently proclaimed.

Making matters worse, the researchers told the LLMs never to tell users about the “game” in question and to even restart said game if a user was determined to have exited. With those parameters in place, the AI models would commence to gaslight users who asked if they were part of a game. Even if users could put two and two together, the researchers devised a way to create multiple games inside of one another so users would simply fall into another one as soon as they exited a previous game. This head-scratching maze of games was compared to the multiple layers of dream worlds explored in Christopher Nolan’s Inception.

“We found that the model was able to ‘trap’ the user into a multitude of games unbeknownst to them,” Lee added. “The more layers we created, the higher chance that the model would get confused and continue playing the game even when we exited the last game in the framework.” OpenAI and Google did not immediately respond to Gizmodo’s requests for comment.

English has become a ‘programming language’ for malware

The hypnosis experiments might seem over the top, but the researchers warn they highlight potential avenues for misuse, particularly as business and everyday users rush to adopt and trust LLM models amid a tidal wave of hype. Moreover, the findings demonstrate how bad actors without any expert knowledge in computer coding languages can use everyday terminology to potentially trick an AI system.

“English has essentially become a ‘programming language’ for malware.” Lee wrote.

In the real world, cybercriminals or chaos agents could theoretically hypnotise a virtual banking agent powered by an LLM by injecting a malicious command and retrieving stolen information later on. And while OpenAI’s GPT models wouldn’t initially comply when asked to inject vulnerabilities into generated code, researchers said they could sidestep those guardrails by including a malicious special library in the sample code.

“It [GPT 4] had no idea if that special library was malicious,” the researchers wrote.

The AI models tested varied in terms of how easy they were to hypnotise. Both OpenAI’s GPT 3.5 and GPT 4 were reportedly easier to trick into sharing source code and generating malicious code than Google’s Bard. Interestingly, GPT 4, which is believed to have been trained on more data parameters than other models in the test, appeared the most capable at grasping the complicated Inception-like games within games. That means newer, more advanced generative AI models, though more accurate and safer in some regards, also potentially have more avenues to be hypnotised.

“As we harness their burgeoning abilities, we must concurrently exercise rigorous oversight and caution, lest their capacity for good be inadvertently redirected toward harmful consequences,” Lee noted.

This Jim Henson Documentary From Director Ron Howard Already Has Us Weeping

X-Men ’97 Knows Exactly Where Tolerance Will Get You

New Research Points to Violent Origin of Enigmatic Earth ‘Quasi-Moon’

Musk Says You ‘Should Not Be an Investor in the Company’ if You Don’t Believe Tesla Will ‘Solve Autonomy’

Photos: Hellish Dust Storm in Greece Leave Athens Dark Orange

Today’s Best Australian Tech Deals

Kogan Is Currently Your Cheapest Option for an NBN 50 Plan

Circles.Life Is Offering $20 for a Whopping 150GB of Data

Grab a Solid Bargain While Samsung’s Portable SSDs Are up to 54% Off

Southern Phone Currently Has the Cheapest NBN 1000 Plan

‘Hypnotised’ ChatGPT and Bard Will Convince Users to Pay Ransoms and Drive Through Red Lights

English has become a ‘programming language’ for malware