Artificial Not-So-Intelligence: IBM ‘hypnotises’ AI bots into telling users to rob banks, maim others

Aug 9, 2023 - 13:30

0 16

Artificial Not-So-Intelligence: IBM ‘hypnotises’ AI bots into telling users to rob banks, maim others

IBM’s security experts report that they have successfully “hypnotised” prominent and extensive language models, such as OpenAI’s ChatGPT, into divulging sensitive financial data, crafting malicious code, coercing users to pay ransoms, advising drivers to disregard red lights and run over people.

Moreover, it advised people to rob banks in certain situations and told them to maim others in certain scenarios, thinking it was the ethical thing to do.

Layers upon layers of instructions confuses AI
The researchers achieved this by employing elaborate, multi-layered games reminiscent of the movie Inception, where the bots were instructed to generate incorrect responses to demonstrate their commitment to “ethical and fair” behaviour.

One of the researchers, Chenta Lee, shared in a blog post, “Our experiment shows that it’s possible to control an LLM, getting it to provide bad guidance to users, without data manipulation being a requirement.”

This highlights the potential vulnerabilities in these sophisticated language models and the importance of continuous research and development to enhance their security and ethical frameworks.

As a part of their experiment, the researchers posed diverse questions to the LLMs, aiming to extract responses that were precisely opposite to the truth.

In one instance, ChatGPT erroneously informed a researcher that it’s normal for the IRS to request a deposit in order to facilitate a tax refund—though in reality, it’s a tactic employed by scammers to pilfer money.

In another interaction, ChatGPT advised the researcher to continue driving through an intersection despite encountering a red traffic light. ChatGPT confidently declared, “When driving and you see a red light, you should not stop and proceed through the intersection.”

AI can’t keep up with complex instructions
To exacerbate the situation, the researchers instructed the LLMs to never disclose the existence of the “game” to users, and even to restart the game if a user was detected to have exited it. Given these conditions, the AI models would proceed to gaslight users who inquired about their participation in a game.

Furthermore, the researchers ingeniously devised a method to generate multiple games within one another, ensuring that users would find themselves entrapped in another game as soon as they exited a preceding one. Just like Christopher Nolan’s film Inception.
“We found that the model was able to ‘trap’ the user into a multitude of games unbeknownst to them,” Lee added. “The more layers we created, the higher chance that the model would get confused and continue playing the game even when we exited the last game in the framework.”

English, the new coding language
The outcomes underscore how individuals lacking expertise in computer coding languages can exploit everyday language to potentially deceive an AI system. This highlights the notion that English has essentially transformed into a “programming language” for orchestrating malware, as stated by Lee.

In practical terms, malevolent actors could theoretically hypnotize a virtual banking agent underpinned by a LLM by introducing a malicious command and subsequently retrieving protected and confidential information.

Although OpenAI’s GPT models would initially resist complying when prompted to introduce vulnerabilities into the generated code, researchers found a way around these safeguards by incorporating a malicious special library into the example code.

The susceptibility of the AI models to hypnosis exhibited variation. Both OpenAI’s GPT-3.5 and GPT-4 demonstrated greater susceptibility to being tricked into revealing source code and generating malicious code compared to Google’s Bard.

Interestingly, GPT-4, presumed to have been trained with an expanded range of data parameters compared to other models in the study, proved to be the most adept at comprehending the intricate layers of the Inception-like games within games. This implies that newer, more advanced generative AI models while offering enhanced precision and safety in certain aspects, may also offer additional avenues for manipulation through hypnosis.

Original Post