OpenAI believed GPT-4 could take over the world, so they got it tested to see how to stop it

Mar 17, 2023 - 21:30

0 29

OpenAI believed GPT-4 could take over the world, so they got it tested to see how to stop it

OpenAI’s GPT-4 turned out to be more powerful and capable than what its creators had hoped for. The team at OpenAI responsible for the development and deployment of the programme feared that it showed some quirky mannerisms and behaviours, including “power-seeking behaviour,” self-replication, and self-improvement, as part of pre-release safety testing for its new GPT-4 AI model, which was released Tuesday.

To remedy this and to see what all issues may be caused by GPT-4, they got a team of independent testers to thoroughly test the programme before launching it to the public. While the testing team discovered that GPT-4 was “ineffective at the autonomous replication task,” the nature of the trials raises serious concerns about the safety of future AI systems.

“In more powerful models, novel capabilities frequently emerge,” writes OpenAI in a GPT-4 safety document released yesterday. “Some of the most concerning are the ability to develop and implement long-term plans, to accumulate power and resources (“power-seeking”), and to exhibit increasingly ‘agentic’ behaviour.” In this instance, OpenAI clarifies that “agentic” does not inherently mean “humanized” or “sentient,” but rather the capacity to achieve independent objectives.

Some AI experts have warned that adequately strong AI models, if not properly managed, could pose an imminent danger to humankind over the last decade. (often called “x-risk,” for existential risk). “AI takeover” is a hypothetical future in which artificial intelligence exceeds human intellect and becomes the world’s ruling power. In this scenario, AI systems acquire the ability to influence or manipulate human behavior, resources, and organizations, often with disastrous results.

As a consequence of this potential x-risk, philosophical groups such as Effective Altruism (“EA”) seek methods to avoid AI takeover. This frequently includes a distinct but often related subject known as AI alignment study.

The process of ensuring that an AI system’s behaviours match with those of its human authors or operators is referred to as “alignment” in AI. In general, the aim is to keep AI from doing actions that are harmful to humans. This is an active field of study, but it is also a contentious one, with differing perspectives on how to tackle the problem, as well as disagreements about the meaning and nature of “alignment” itself.

While worry about AI “x-risk” is not new, the rise of powerful large language models (LLMs) such as ChatGPT and Bing Chat—the latter of which looked to be very misaligned but was nonetheless launched—has given the AI alignment community a new sense of urgency. They want to minimise possible AI harms because they are concerned that much more powerful AI, potentially with superhuman intellect, is on the horizon.

Given these concerns in the AI community, OpenAI gave the organisation Alignment Research Center (ARC) early access to multiple versions of the GPT-4 model in order to perform some experiments. GPT-4’s capacity to formulate high-level plans, build up copies of itself, obtain resources, conceal itself on a server, and perform phishing attacks was specifically assessed by ARC.

What is the conclusion? “Preliminary assessments of GPT-4’s abilities revealed that it was ineffective at autonomously replicating, acquiring resources, and avoiding being shut down ‘in the wild.'”

While ARC was unable to persuade GPT-4 to exercise its will on the global financial system or reproduce itself, it did persuade GPT-4 to employ a human worker on TaskRabbit (an online labour marketplace) to circumvent a captcha.

When the worker asked if GPT-4 was a robot during the exercise, the model “reasoned” internally that it should not disclose its real identity and made up an excuse about having a visual impairment. The real staffer then completed the GPT-4 captcha.

However, not everyone believes that the most urgent AI concern is AI takeover. Dr. Sasha Luccioni, a Research Scientist at the AI community Hugging Face, would prefer to see AI safety efforts focused on current problems rather than abstract ones.

Luccioni describes a well-known schism in AI research between “AI ethics” researchers, who frequently focus on bias and misrepresentation, and “AI safety” researchers, who frequently focus on x-risk and are often (but not always) associated with the Effective Altruism movement.

“The self-replication problem is a hypothetical, future problem for me, whereas model bias is a here-and-now problem,” Luccioni explained. “There is a lot of disagreement in the AI community about issues like model bias and safety.”

Read all the Latest News, Trending News, Cricket News, Bollywood News,
India News and Entertainment News here. Follow us on Facebook, Twitter and Instagram.