Generative A.I.’s copyright problem grows as lawsuits pile onto OpenAI, Google, and Meta

Comedian Sarah Silverman performs at the Ryman Auditorium on March 22, 2023, in Nashville.

Jason Kempin—Getty Images

Hi, and welcome to Eye on A.I. David Meyer here in Berlin, filling in for Jeremy Kahn, who is over at our Brainstorm Tech conference in Utah (more on that later).

Copyright-related lawsuits have been an occasional feature of the generative A.I. boom since it began—a class action over Microsoft’s GitHub Copilot last November; artists and Getty Images suing Stability AI at the start of this year—but the past couple weeks have seen a real flurry of activity.

The highest-profile suits come courtesy of star comedian Sarah Silverman, who (along with fellow authors Chris Golden and Richard Kadrey) last Friday went after both Meta and OpenAI over the training of the companies’ large language models (LLMs) on their copyrighted books. The authors are represented by lawyers Joseph Saveri and Matthew Butterick, who also organized the previously mentioned class actions (except for the Getty one), and who launched a similar class action against OpenAI a couple of weeks ago on behalf of authors Mona Awad and Paul Tremblay.

Here are Saveri and Butterick on the Meta and OpenAI suits: “Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by Plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation.” The fact that the LLMs are able to summarize text from the books is, the suits allege, evidence of this training.

Meanwhile, the Clarkson Law Firm filed a pair of class actions against OpenAI (at the end of June) and Google and DeepMind (yesterday) on behalf of anonymized individuals. Though the gist is similar to the authors’ suits, it’s safe to say the claims here are extremely, er, broad. Here’s how the Google/DeepMind suit begins: “It has very recently come to light that Google has been secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans. Google has taken all our personal and professional information, our creative and copywritten works, our photographs, and even our emails—virtually the entirety of our digital footprint—and is using it to build commercial artificial intelligence products…This mass theft of personal information has stunned internet users around the world.”

Although the suit claims Google “harvested this data in secret, without notice or consent from anyone,” the company says it has been “clear for years” that its A.I. gets trained on publicly available data, and the suit is baseless.

It’s becoming clear that, if generative A.I. has an Achilles’ heel (that isn’t its tendency to “hallucinate”), it’s copyright. Some of these suits may be more plausible than others, and justice needs to take its course, but there does at least seem to be a strong argument for saying generative A.I. relies on the exploitation of stuff that people have created, and that the business models that accompany the technology do not allow for those people to be compensated for this absorption and regurgitation.

The copyright issue may also stymie A.I. companies’ prospects in Europe. As Jeremy wrote recently, none of the current foundation models can comply with the EU’s draft A.I. Act, with a common problem being their lack of transparency around the copyrighted data on which they were trained.

OpenAI, which had avoided copyright-infringement suits until late June, appears to be scrambling to appease copyright holders. Last week, it tweeted that ChatGPT’s Browse beta, which connects the chatbot to the internet, was sometimes reproducing the full text of web pages. “We are disabling Browse while we fix this—want to do right by content owners,” the company said.

It may also be relevant to note that, at the Fortune Brainstorm Tech conference this week in Deer Valley, Utah, Microsoft search and A.I. vice president Jordi Ribas told Jeremy that the A.I.-ified Bing is actually sending more—not less—traffic to publishers, “because people are engaging more.” He continued: “To really be successful, we need the publisher and the advertising community to be successful. That’s how the ecosystem works.”

Speaking of Brainstorm Tech, Jeremy also interviewed Anthropic CEO Dario Amodei, who laid out his three-tier system for assessing A.I.’s risks. You can read the full details of the session, but Amodei’s summation is this: “My guess is that things will go really well. But there’s a risk, maybe 10% or 20%, that this will go wrong, and it’s incumbent on us to make sure that doesn’t happen.”

More from the conference here, and more A.I. tidbits below.

David Meyer
@superglaze
david.meyer@fortune.com

A.I. IN THE NEWS

GPT-4’s details revealed. SemiAnalysis, the site that revealed the now-confirmed “We have no moat” Google email a couple of months back, has published a blockbuster report on OpenAI’s GPT-4 architecture. Details include—big breath: “model architecture, training infrastructure, inference infrastructure, parameter count, training dataset composition, token count, layer count, parallelism strategies, multi-modal vision adaptation, the thought process behind different engineering tradeoffs, unique implemented techniques, and how they alleviated some of their biggest bottlenecks related to inference of gigantic models.” You can find it here.

OpenAI is stepping up its quest for “superintelligence alignment.” The company said last week that it’s forming a new “superalignment” team to “ensure A.I. systems much smarter than humans follow human intent.” It wants results within four years and has put cofounder and chief scientist Ilya Sutskever in charge, with 20% of OpenAI compute being dedicated to the task.

OECD outlines the scale of A.I.’s jobs threat in richer countries. A new OECD report warns that, within the 38-country club, “taking the effect of A.I. into account, occupations classified to be at highest risk of automation account for about 27% of employment.” Interestingly, the OECD’s research showed that three in five workers thought their job would be entirely lost to A.I. within the next decade, so the impact may be seen as a positive surprise if only just over a quarter will be affected. Low- and middle-skilled jobs are most at risk.

IBM considers in-house A.I. chips. Reuters reports that IBM is considering the idea of lowering the cost of its new “watsonx” enterprise A.I. platform by using chips especially manufactured for it by Samsung. IBM announced the Artificial Intelligence Unit system-on-a-chip last October but was vague about its actual use case. IBM semiconductors chief Mukesh Khare told Reuters that the company already has several thousand prototype chips in operation.

Gizmodo’s A.I. article disaster. More fun and games in media’s experimentation with A.I.: The tech and science-fiction outlet Gizmodo used Google Bard and ChatGPT to write an article listing all the Star Wars movies and TV shows in chronological order. According to the Washington Post, it was riddled with errors and caused absolute outrage among Gizmodo’s editorial staff. When Merrill Brown, the editorial director of Gizmodo owner G/O Media, told staffers on Slack that more experiments were coming, his message received “16 thumbs down emoji, 11 wastebasket emoji, six clown emoji, two face palm emoji and two poop emoji.”

Top U.K. universities get their act together on A.I. The Russell Group of 24 leading British research universities has produced a series of guidelines for how to deal with A.I., according to a Guardian report. The guidance says students and staff should become A.I.-literate, and teaching should be adapted to allow ethical A.I. use. Here’s Andrew Brass, who heads up the University of Manchester’s School of Health Sciences: “From our perspective, it’s clear that this can’t be imposed from the top down, but by working really closely with our students to cocreate the guidance we provide. If there are restrictions for example, it’s crucial that it’s clearly explained to students why they are in place, or we will find that people find a way around it.”

EYE ON A.I. RESEARCH

Anthropic yesterday released Claude 2, the newest version of its large language model. There’s a new public-facing website for this one, though only people in the U.S. and U.K. get to converse with the chatbot—like Google with Bard, there’s no EU access for now.

Anthropic cofounder and chief scientist Jared Kaplan told Jeremy that Claude 2 provides a big boost in coding ability, scoring a possibly industry-leading 71.2% on the Codex P@1 benchmark. Because it also has a very large “context window”—the number of tokens it can take in and output—Claude 2 can also generate book-length material. Kaplan: “Something that we were wondering about was whether or not we could maintain the level of coherence and sophistication when writing a three-page or a five-page document all in one go, and I think we were reasonably happy with the results.”

Like its predecessor, Claude 2’s answers are supposed to adhere to Anthropic’s safety principles thanks to the use of another LLM to assess adherence to this “constitution.” Kaplan said Anthropic wants to keep making Claude more useful. “It will allow you to do more things without necessarily requiring a great deal of technical expertise,” he said.

Correction: This story was updated on July 13th to remove a reference to Kaplan having spoken to Kahn at the Brainstorm Tech conference; Kaplan was in fact in San Francisco.

FORTUNE ON A.I.

Andy Jassy dismisses Microsoft and Google A.I. ‘hype cycle’ and says Amazon is starting a ‘substance cycle’, by Paolo Confino

CEO slammed for replacing 90% of his company’s support staff with an A.I. chatbot—then bragging about it on Twitter, by Orianna Rosa Royle

ChatGPT suddenly ‘isn’t booming anymore,’ Google A.I. researcher says—and kids are the big problem, by Stephen Pastis

Scientists just used A.I. to map a fruit fly’s brain. Here’s why it’s a ‘turning point in neuroscience’, by Rachel Shin

BRAINFOOD

Managing the potential dangers of A.I. continues to be the hottest of topics. The Centre for the Governance of A.I. this week published a research blog post by three of the authors behind a new white paper on “managing emerging risks to public safety,” and it’s worth a read. In it, they warn that the next generation of foundation models could be used to design chemical weapons, hack into safety-critical software systems, churn out vast amounts of disinformation, and even evade human control.

How to regulate all this? They recommend ongoing risk assessments before and after a model’s release with “external scrutiny.” Regulatory requirements should “evolve over time” but, given the enormous compute costs of building such “frontier” models, and the shortage of talent out there, they think the rules should for now “only target the handful of well-resourced companies developing these models, while posing few or no burdens on other developers.”

Meanwhile, Bill Gates yesterday published a blog post on much the same subject. You can read it here, but the gist is that the risks are manageable. Following on from Jeremy’s recent post about competing A.I.-risk narratives, Gates’s approach seems to align with (surprise!) that of Microsoft president Brad Smith, in that he makes a less-scary-than-nukes analogy of how we adapted to the dangers posed by cars (Smith chose planes). Gates: “Soon after the first automobiles were on the road, there was the first car crash. But we didn’t ban cars—we adopted speed limits, safety standards, licensing requirements, drunk-driving laws, and other rules of the road.”

Gates wrote that he worries about A.I.-fueled deepfakes and misinformation, but thinks A.I. will be good at spotting the phenomena too. As for A.I.-assisted hackers, well, A.I. can stop them too: “This is also why we should not try to temporarily keep people from implementing new developments in AI, as some have proposed. Cyber-criminals won’t stop making new tools. Nor will people who want to use AI to design nuclear weapons and bioterror attacks. The effort to stop them needs to continue at the same pace.” He also thinks A.I. models can be taught not to hallucinate, and to not be biased.

This is the online version of Eye on A.I., a free newsletter delivered to inboxes on Tuesdays. Sign up here.

Generative A.I.’s copyright problem grows as lawsuits pile onto OpenAI, Google, and Meta

A.I. IN THE NEWS

EYE ON A.I. RESEARCH

FORTUNE ON A.I.

BRAINFOOD

Most Popular