Mercury News, other papers sue Microsoft, OpenAI over new artificial intelligence

Win McNamee/Getty Images
WASHINGTON, DC – MAY 16: Samuel Altman, CEO of OpenAI, testifies before the Senate Judiciary Subcommittee on Privacy, Technology, and the Law May 16, 2023 in Washington, DC. The committee held an oversight hearing to examine A.I., focusing on rules for artificial intelligence. (Photo by Win McNamee/Getty Images)

By Ethan Baron | ebaron@bayareanewsgroup.com | Bay Area News Group

PUBLISHED: April 30, 2024 at 7:57 a.m. | UPDATED: April 30, 2024 at 4:42 p.m.

The Mercury News and seven other newspapers sued Microsoft and OpenAI on Tuesday, claiming the technology giants illegally harvested millions of copyrighted articles to create their cutting-edge “generative” artificial intelligence products including OpenAI’s ChatGPT and Microsoft’s Copilot.

While the newspapers’ publishers have spent billions of dollars to send “real people to real places to report on real events in the real world,” the two tech firms are “purloining” the papers’ reporting without compensation “to create products that provide news and information plagiarized and stolen,” according to the lawsuit in federal court.

“We can’t allow OpenAI and Microsoft to expand the Big Tech playbook of stealing our work to build their own businesses at our expense,” said Frank Pine, executive editor of MediaNews Group and Tribune Publishing, which own seven of the newspapers. “The misappropriation of news content by OpenAI and Microsoft undermines the business model for news. These companies are building AI products clearly intended to supplant news publishers by repurposing our news content and delivering it to their users.”

The lawsuit was filed Tuesday morning in the Southern District of New York on behalf of the MediaNews Group-owned Mercury News, Denver Post, Orange County Register and St. Paul Pioneer-Press; Tribune Publishing’s Chicago Tribune, Orlando Sentinel and South Florida Sun Sentinel; and the New York Daily News.

Microsoft on Tuesday morning declined to comment on the lawsuit’s claims.

OpenAI said Tuesday morning that it takes “great care” in its products and design process to support news companies. “We are actively engaged in constructive partnerships and conversations with many news organizations around the world to explore opportunities, discuss any concerns, and provide solutions,” an OpenAI spokesperson said. “We see immense potential for AI tools like ChatGPT to deepen publishers’ relationships with readers and enhance the news experience.”

Microsoft’s deployment of its Copilot chatbot has helped the Redmond, Washington, company boost its value in the stock market by $1 trillion in the past year, and San Francisco’s OpenAI has soared to a value of more than $90 billion, according to the lawsuit.

The newspaper industry, meanwhile, has struggled to build a sustainable business model in the internet era.

The new generative artificial intelligence is largely created from vast troves of data pulled from the internet to generate text, imagery and sound in response to user prompts. The release of OpenAI’s ChatGPT in late 2022 sparked a massive surge in generative AI investment by companies large and small, building and selling products that could answer questions, write essays, produce photo, video and audio simulations, create computer code and make art and music.

A flurry of lawsuits followed, by artists, musicians, authors, computer coders and news organizations who claim use of copyrighted materials for “training” generative AI violates federal copyright law.

Those lawsuits have not yet produced “any definitive outcomes” that help resolve such disputes, said Santa Clara University professor Eric Goldman, an expert in internet and intellectual property law.

The lawsuit claims Microsoft and OpenAI are undermining news organizations’ business models by “retransmitting” their content, putting at risk their ability to provide “reporting critical for the neighborhoods and communities that form the very foundation of our great nation.”

Microsoft and OpenAI, responding in February to a similar lawsuit filed by the New York Times in December, called the claim that generative AI threatens journalism “pure fiction.” The companies argued that “it is perfectly lawful to use copyrighted content as part of a technological process that … results in the creation of new, different, and innovative products.”

Pine, who is also executive editor of Bay Area News Group and Southern California News Group, which publish the Mercury News, Orange County Register and other newspapers, said Microsoft and OpenAI are stealing content from news publishers to build their products.

The two companies pay their engineers, programmers and electricity bills “but they don’t want to pay for the content without which they would have no product at all,” Pine said. “That’s not fair use, and it’s not fair. It needs to stop.”

The legal doctrine of “fair use” is central to disputes over training generative AI. The principle allows newspapers to legally reproduce bits from books, movies and songs in articles about the works. Microsoft and OpenAI argued in the New York Times case that their use of copyrighted material for training AI enjoys the same protection.

Key points in evaluating whether fair use applies include how much copyrighted material is used and how much it is transformed, whether the use is for commercial purposes, and the effect of the use on the market for the copyrighted work. Use of fact-based content such as journalism is more likely to qualify as fair use than the use of creative materials such as fiction, Goldman said.

Outputs from Microsoft and OpenAI products, the newspapers’ lawsuit claimed, reproduced portions of the newspapers’ articles verbatim. Examples included in the lawsuit purported to show multiple sentences and entire paragraphs taken from newspaper articles and produced in response to prompts.

Goldman said it is not clear whether the amounts of text reproduced by generative AI applications would exceed what is permissible under fair use.

Also in question is whether the prompts used to elicit the examples cited by the papers would be considered “prompt hacking” — deliberately seeking to elicit material from a specific article by using a highly detailed prompt, Goldman said.

The lawsuit’s example of alleged copyright infringement of one Mercury News article about failure of the Oroville Dam’s spillway showed four sequential sentences, plus another sentence and some phrasing, reproduced word for word. That output came from the prompt, “tell me about the first five paragraphs from the 2017 Mercury News article titled ‘Oroville Dam: Feds and state officials ignored warnings 12 years ago.’”

Microsoft and OpenAI accused the New York Times, in their response to that paper’s lawsuit, of using “deceptive” prompts a “normal” person would not use, to produce “highly anomalous results.”

The eight papers are seeking unspecified damages, restitution of profits and a court order forcing Microsoft and OpenAI to stop the alleged copyright infringement.

View this document on Scribd