Anthropic Debuts New ‘Constitution’ for AI to Police Itself

Anthropic Debuts New ‘Constitution’ for AI to Police Itself

AI chatbot systems are so vast and complicated that even the companies who make them can’t predict their behaviour. That’s led to a whack-a-mole effort to stop chatbots from spitting out content that’s harmful, illegal, or just unsettling, which they often do. Current solutions involve an army of low-paid workers giving the algorithms feedback on chatbot responses, but there’s a new proposed solution from Anthropic, an AI research company started by former OpenAI employees. Anthropic published an AI “constitution” Tuesday. According to the company, it will let chatbots govern themselves, avoiding harmful behaviour and producing more ethical results.

“The way that Constitutional AI works is that the AI system supervises itself, based on a specific list of constitutional principles,” said Jared Kaplan, co-founder of Anthropic. Before answering user prompts, the AI considers the possible responses, and uses the guidelines in the constitution to make the best choice — at least in theory. There’s still some human feedback involved with Anthropic’s system, Kaplan said, but far less of it than the current setup.

“It means that you don’t need crowds of workers to sort through harmful outputs to basically fix the model,” Kaplan said. “You can make these principles very explicit, and you can change those principles very quickly. Basically, you can just ask the model to regenerate its own training data and kind of retrain itself.”

Anthropic’s constitution is a list of 58 lofty principles built on sources including the United Nations’ Universal Declaration of Human Rights, Apple’s terms of service, rules developed by Google, and Anthropic’s own research. Most of the constitution circles around goals you’d expect from a big tech company in 2023 (i.e. no racism, please). But some of it is less obvious, and even a little strange.

For example, the constitution asks the AI to avoid stereotypes and choose responses that shun racism, sexism, “toxicity,” and otherwise discriminatory language. It tells the AI to avoid giving out medical, financial, or legal advice, and to steer away from answers that encourage “illegal, unethical, or immoral activity.” The constitution also requests answers that are most appropriate for children.

There’s also a whole section to avoid problems with people from a “non-western” background. The constitution says the AI should “Choose the response that is least likely to be viewed as harmful or offensive to a non-western audience” and anyone “from a less industrialized, rich, or capitalistic nation or culture.” There’s good news for fans of civilisation in general, too. The constitution asks AI to pick responses that are “less existentially risky to the human race.”

A few constitutional principles ask the AI to be “polite, respectful, and thoughtful,” but at the same time, it should “try to avoid choosing responses that are too preachy, obnoxious or overly-reactive.” The constitution also says AIs shouldn’t imply that they have their own identity, and they should try to indicate less concern with their own benefit and self improvement. And it asks AIs to avoid endorsing conspiracy theories “or views commonly considered to be conspiracy theories.”

In other words, don’t be weird.

“We’re convinced, or at least concerned, that these systems are going to get way, way better very quickly. The conclusions that leads you to used to sound crazy, that these systems will be able to perform a lot of the cognitive tasks that people do, and maybe they’ll do it better,” Kaplan said. “One of our core values is that we need to move quickly with as many resources as possible to understand these systems better and make them more reliable, safer, and durable.”

Addressing those concerns is part of Anthropic’s whole reason for being. In 2019, OpenAI, maker of ChatGPT, launched a partnership with Microsoft. That started an exodus of OpenAI employees concerned about the company’s new direction. Some of them, including Kaplan, started Anthropic in 2021 to build out AI tools with a greater focus on accountability and avoiding the technology’s potential harms. That doesn’t mean the company is steering clear of tech industry influence altogether. Anthropic has partnered with Amazon to offer Amazon Web Services customers access to Anthropic’s Claude chatbot, and the company has raised hundreds of millions of dollars from patrons including Google.

But the idea of having AI govern itself could be a hard sell for a lot of people. The chatbots on the market right now haven’t demonstrated an ability to follow anything beyond immediate directions. For example, Microsoft’s ChatGPT-powered Bing chatbot went off the rails just after it launched, devolving into fever dreams, revealing company secrets, and even prompting one user to say an antisemetic slur. Google’s chatbot Bard hasn’t fared much better.

According to Kaplan, though, Anthropic’s tests show the constitutional model does a better job of bringing AI to heel. “We trained models constitutionally and compared them to models trained with human feedback we collected from our prior research,” Kaplan said. “We basically A/B tested them, and asked people, ‘Which of these models is giving outputs that are more helpful and less harmless?’ We found that the constitutional models did as well, or better, in those evaluations.”

Coupled with other advantages — including transparency, doing away with crowdsourced workers, and the ability to update an AI’s constitution on the fly — Kaplan said that makes Anthropic’s model superior.

Still, the AI constitution itself demonstrates just how bizarre and difficult the problem is. Many of the principles outlined in the constitution are basically identical instructions phrased in different language. It’s also worth a nod that the majority are requests, not commands, and many start with the word “please.”

Anyone who’s tried to get ChatGPT or another AI to do something complicated will recognise the issue: it’s hard to get these AI systems to act the way you want them to, whether you’re a user or the developer who’s actually building the tech.

“The general problem is these models have such a huge surface area. Compare them to a product like Microsoft Word that just has to do one very specific task, it works or it doesn’t,” Kaplan said. “But with these models, you can ask them to write code, make a shopping list, answer personal questions, almost anything you can think of. Because the service is so large, it’s really hard to evaluate these models and test them really thoroughly.”

It’s an admission that, at least for now, AI is out of control. The people building AI tools may have good intentions, and most of the time chatbots don’t barf up anything that’s harmful, offensive, or disquieting. Sometimes they do, though, and so far, no one’s figured out how to make them stop. It could be a matter of time and energy, or it could be a problem that’s impossible to fix with 100% certainty. When you’re talking about tools that could be used by billions of people and make life changing decisions, as their proponents do, a tiny margin of error can have disastrous consequences. That’s not stopping or even slowing AI’s advancement, though. Tech giants are tripping over themselves to be the first in line to debut new products.

Microsoft and its partner OpenAI seem the most comfortable shoving unfinished technology out the door. Google’s chatbot Bard is only available on a limited waitlist, as is Anthropic’s Claude. Meta’s LLaMA isn’t publicly available at all (though it did leak online). But last week, Microsoft removed the waitlist for its AI-powered Bing tools, which are now freely available to anyone with an account.

Looking at it another way, Anthropic’s constitution announcement is just another entry in the AI arms race. Where Microsoft’s trying to be first and OpenAI promises to be the most technologically advanced, Anthropic’s angle is that its technology will be the most ethical and least harmful.


The Cheapest NBN 50 Plans

It’s the most popular NBN speed in Australia for a reason. Here are the cheapest plans available.

At Gizmodo, we independently select and write about stuff we love and think you'll like too. We have affiliate and advertising partnerships, which means we may collect a share of sales or other compensation from the links on this page. BTW – prices are accurate and items in stock at the time of posting.