A.I. experts say the Google researcher’s claim that his chatbot became ‘sentient’ is ridiculous—but also highlights big problems in the field

If artificial intelligence researchers can agree on one thing, it’s this: Blake Lemoine is wrong.

Lemoine is the Google artificial intelligence engineer who, in a story the Washington Post ran over the weekend, claimed that he was suspended from his job after raising concerns that an A.I.-powered chatbot he was working on had become sentient.

Google says it investigated Lemoine’s claims and found them to be baseless. It also says that Lemoine was placed on paid administrative leave because he leaked confidential company information as well as engaged in a series of provocative actions, including trying to hire a lawyer to represent the chatbot, and talking to members of the Judiciary Committee of the U.S. House of Representatives about Google’s allegedly unethical activities.

But Google is not the only one that pushed back on Lemoine’s tale of sentience.

After the story’s publication, a who’s who of respected artificial intelligence researchers took to Twitter to weigh in on the issue. And almost every member of this normally combative group was in agreement about Lemoine. The chatbot, which Google calls LaMDA, is not sentient, they said. It cannot feel (which is the definition of sentience). It does not have thoughts. It does not have a sense of self.

“Nonsense on stilts”

“Nonsense on stilts,” Gary Marcus, a former New York University psychology professor who is a leading critic of many of today’s approaches to A.I., wrote of Lemoine’s claims. In a blog post on Lemoine’s case, Marcus pointed out that all LaMDA and other large language models do is predict a pattern in language based on a vast amount of human-written text they’ve been trained on. In LaMDA’s case, it learns from transcripts of human dialogues. So it shouldn’t be so surprising it can convincingly imitate them. But that does not mean it has feelings or a sense of self. As Marcus said, the “language these systems utter doesn’t actually mean anything at all. And it sure as hell doesn’t mean that these systems are sentient.”

David Pfau, a research scientist at DeepMind, the London-based A.I. company that is also owned by Google-parent Alphabet, tweeted, “One day we’ll tell stories about how people thought language models were intelligent with the same incredulity of people thinking that cameras steal your soul, or that the train in the movie was actually going to crash into people.”

Erik Brynjolfsson, an economist at Stanford University’s Institute for Human-Centered AI who studies the economic impact of increasing levels of autonomy, tweeted that Lemoine’s claims are the modern equivalent of a dog who hears a voice from a gramophone and is convinced his master is inside the machine. (Marcus picked up on this image to illustrate his blog post.)

Lemoine’s claims and the uncanny conversational abilities displayed by LaMDA in the transcripts Lemoine has leaked to the press highlight what many A.I. ethicists say is the importance for companies that deploy A.I. systems that can hold such dialogues to always make it clear to people interacting with such software that they are not speaking to a real person.

In 2018, Google was widely condemned by technology ethicists for conducting demonstrations of its voice-enabled A.I. assistant Google Duplex, in which the software—which creates very natural sounding speech, including pauses and filler words and sounds such as “hmm,” and “um”—called a restaurant and made reservations without the person on the line being aware they were speaking to an A.I. system. Since then, many A.I. ethicists have redoubled their calls for companies using chatbots and other “conversational A.I.” to make it crystal clear to people that they are interacting with software, not flesh-and-blood people.

Who’s responsible?

In the Twitter dustup over Lemoine’s case, while the A.I. researchers all agreed LaMDA isn’t sentient, they quickly fell out with one another over who, if anyone, bore partial responsibility for Lemoine’s misapprehension.

Some faulted companies that produce A.I. systems known as ultra-large language models, one of which underpins LaMDA, for making inflated claims about the technology’s potential. Some of scientists working at these companies have suggested that ultra-large language models are a step toward human-like artificial general intelligence. But some critics question whether Google and other companies should continue to pursue such ultra-large language model research.

There was also debate over whether the Turing test—a famous thought experiment about a machine that could conduct a dialogue with a person so convincingly that the person would not know they were conversing with a machine—should continue to serve as a guiding benchmark for many in A.I. research.

Marcus is among those who points out that the Turing test has ceased to be a useful benchmark for progress toward artificial general intelligence, which is variously defined as a single A.I. system that can perform many different, economically useful tasks as well or better than a human, or a machine system that exhibits traits normally associated with human intelligence, including adaptable and flexible learning, creativity, logical reasoning, and common sense. He notes that as far back as the mid-1960s software called ELIZA, which was supposed to mimic the dialogue of a Freudian psychoanalyst, convinced some people it was a person. And yet ELIZA did not lead to AGI. Nor did Eugene Goostman, an A.I. program that in 2014 won a Turing test competition, by fooling some judges of the contest into thinking it was a 13-year-old boy.

“It has made zero lasting contribution to AI,” Marcus wrote.

Miles Brundage, who researches governance issues around A.I. at OpenAI, the San Francisco research company that is among those pioneering the commercial use of ultra-large language models similar to the one that Google uses for LaMDA, called Lemoine’s belief in LaMDA’s sentience “a wake-up call.” He said it was evidence for “how prone some folks are to conflate” concepts such as creativity, intelligence, and consciousness, which he sees a distinct phenomenon, although he said he did not think OpenAI’s own communications had contributed to this conflation. OpenAI’s chief scientist, Ilya Sutskever, has often said he thinks these large language models may be the path to AGI and even tweeted in February that today’s large language models “may be slightly conscious.”

Brundage’s comments—and in particular his view that OpenAI’s own comments and tweets were not responsible for Lemoine’s confused belief in LaMDA’s sentience—drew a sharp reply from Emily Bender, a professor of computational linguists at the University of Washington who is highly critical of the ultra-large language models. In an exchange with Brundage over Twitter, she implied that OpenAI and other companies working on this technology needed to acknowledge their own responsibility for hyping the technology as a possible path to AGI.

“Senior research at organization that claims to be making progress towards ‘AGI’ by building bigger and bigger LMs doesn’t believe they have any impact on people thinking that LM driven chatbots are sentient. Got it,” she wrote.

Large language model debates

The debate over large language models has intensified as these systems have become more capable and potentially useful for business—able to perform many different language tasks including translation, answering questions, summarization, as well as composing coherent text, often without needing a large amount of task-specific training.

Such models already underpin systems being used by Google and Microsoft. And a host of startups are trying to create digital assistants based on these and similar kinds of A.I. But this A.I. software also requires vast amount of computing power to train, limiting how many companies can afford to create such systems of their own. Large language models are also controversial because such systems can be unpredictable and hard to control, often spewing toxic language or factually incorrect information in response to questions, or generating nonsensical text.

It is also worth noting that this entire story might not have gotten such oxygen if Google had not in 2020 and 2021 forced out Timnit Gebru and Margaret Mitchell, the two co-leads of its Ethical A.I. team. Gebru was fired after she got into a dispute with Google higher-ups over their refusal to allow her and her team to publish a research paper, coauthored with Bender, that looked at the harms large language models cause—ranging from their tendency to regurgitate racist, sexist, and homophobic language they have ingested during training to the massive amount of energy the computer servers needed to run such ultra-large A.I. systems. Mitchell was later fired in part for actions she took in support of Gebru.

Those firings have made the suspension or firing of others affiliated with Google’s A.I. efforts more newsworthy than they would be otherwise. This is especially true if that suspension has anything to do with large language models, as did Lemoine’s. But it also was a factor in the case of Google A.I. researcher Satrajit Chatterjee, who has claimed he is a victim of company censorship. Chatterjee said Google fired him after a dispute over its refusal to allow him to publish a paper in which he criticized the work of fellow Google A.I. scientists who had published work on A.I. software that could design parts of computer chips better than human chip designers. Google says it fired Chatterjee for cause and MIT Technology Review reported Chatterjee waged a long campaign of professional harassment and bullying that targeted the two female scientists who had worked on the A.I. chip design research.

Sign up for the Fortune Features email list so you don’t miss our biggest features, exclusive interviews, and investigations.