OpenAI, The New York Times debate copyright infringement of AI tech companies in trial arguments

As a Digiday+ member, you were able to access this article early through the Digiday+ Story Preview email. See other exclusives or manage your account.This article was provided as an exclusive preview for Digiday+ members, who were able to access it early. Check out the other features included with Digiday+ to help you stay ahead
Correction: A previous version of this article incorrectly mischaracterized the discovery process.
This week’s court hearing in The New York Times’ case against OpenAI gave another glimpse of both sides’ legal strategies for the high-profile lawsuit over AI copyright.
On Tuesday, a federal judge heard oral arguments from both parties in a motion to dismiss brought by OpenAI and its financial backer Microsoft. The New York Times — as well as The New York Daily News and the Center for Investigative Reporting, which have filed their own lawsuits against OpenAI and Microsoft — claim OpenAI and Microsoft used the publishers’ content to train their large language models powering their generative AI chatbots. Doing so means the tech companies are competing with those publishers by using their content to answers users’ questions, taking away the incentive for a user to visit their sites for that information and ultimately hurting their ability to monetize those users through digital advertising and subscriptions, they claim.
OpenAI and Microsoft say what they’re doing is covered by “fair use,” a law that allows the use of copyrighted material to make something new that doesn’t compete with the original work.
The outcome of this lawsuit has large implications for the entire digital media ecosystem, and will determine the legality of generative AI tools using publisher’s copyrighted work without their consent for training.
Here were the main arguments during the trial:
The New York Times’ argument
Using copyrighted content
OpenAI is using The New York Times’ content to train its large language models, sometimes by making copies of that content, the plaintiffs claim. Sometimes several paragraphs or entire articles part of that training dataset are returned in response to a user’s prompt. And in some cases, fresh content the LLM didn’t use for its training (because of a cut-off date) is also regurgitated by the LLM in response to a prompt. Plaintiffs gave examples of outputs that have verbatim language or summaries of articles without attribution from The New York Times.
LLMs copy content because they can’t process information like humans
Humans can read something, understand the underlying information and learn something new, which isn’t considered copying information. But LLMs don’t have the ability to do that since they are machines, meaning the models absorb the “expression” of the facts, not the facts themselves, which should be considered copyright infringement, according to The New York Times’ lawyers.
Generative AI search is different from a traditional search engine
Unlike a traditional search engine (where links to the original source are provided and a publisher can monetize that traffic through advertising or subscriptions), a generative search engine provides the answer to a question with sources in the footnotes. The footnotes, The New York Times’ lawyers argue, can contain a variety of sources, which hurts a publisher’s ability to get that user to their site.
Evading paywalls
OpenAI has custom GPTs in its store with products that help users remove paywalls. “Users were posting to Reddit forums and social media how they’ve gotten around a paywall using a product called SearchGPT, and in fact OpenAI pulled the product after they were aware products were being used to infringe,” said Ian Crosby, a partner at Susman Godfrey and The New York Times’ lead counsel.
Time-sensitive content gets stripped without attribution
The New York Times’ lawyers said content was being used from The Times’ product recommendation site Wirecutter without appropriate attribution, which means Wirecutter lost revenue from people not clicking through to the site and on affiliate links. And that stripped content was sometimes time-sensitive, such as product recommendations around Black Friday. They claim the content should be protected by a “hot news” doctrine, part of copyright law that protects time-sensitive news from being used by competitors. The lawyers argued ChatGPT cited some products as endorsed by Wirecutter when they weren’t, which hurts the brand’s reputation.
OpenAI and Microsoft’s arguments
Fair use doctrine
Lawyers for OpenAI and Microsoft said the copyrighted materials in question are allowed under fair use doctrine. AI companies have been staunch proponents of the doctrine, which allows copyrighted materials to be used without permission as long as the use is different from their primary purpose, used in non-commercial contexts and not used in a way that would harm whoever owns the copyright.
Annette Hurst, an attorney representing Microsoft, said LLMs understand language and ideas that can be adapted for “everything from curing cancer to national security: “The plaintiffs in their own words have alleged that this technology is capable of being commercialized to the tune of billions of dollars without regard to any capability for how.”
How LLMs work
Defense attorneys also disagreed with their plaintiff counterparts when it came to describing how large language models work. For example, OpenAI’s attorney said the company’s LLMs don’t actually store copyrighted content, but just rely on the weights of data derived from the training process.
“If I say to you, ‘Yesterday all my troubles seemed so,’ we will all think to ourselves [think] “far away” because we have been exposed to that text so many times,” said Joe Gratz, an attorney at Morrison & Foerster that represented OpenAI. “That doesn’t mean you have a copy of that song somewhere in your brain.”
Statute of limitations
Lawyers claimed the lawsuit shouldn’t be allowed because of the three-year statute of limitations for copyright infringement cases. However, attorneys for the Times note it wasn’t possible to know by April 2021 that OpenAI would be using the publishers’ content in ways that would harm it.
‘Misleading’ examples
Lawyers for the Times say they’ve found millions of examples to provide their case. However, OpenAI argued plaintiffs have been misleading with examples of how ChatGPT replicates copyrighted content and with examples of how AI-generated content cites the Times in inaccurate answers. Defense lawyers also claim the Times exploited aspects of ChatGPT that helped use prompts to generate AI content that violated OpenAI’s terms. (Lawyers also noted OpenAI has sought to address the weaknesses.)
No proof of harm
The Times’ claims include OpenAI removing copyright management information (CMI) such as mastheads, author bylines and other identifiable information. However, OpenAI and Microsoft say the plaintiffs haven’t proven how they were harmed by removing CMI. They also claim plaintiffs haven’t shown OpenAI and Microsoft willingly infringed on copyrighted works. However, plaintiff lawyers said past court rulings have recognized copying copyrighted content was infringement on its own without any need to prove dissemination or economic loss.
“Their biggest problem is they don’t have a plausible story for how they would be better off if the CMI they say was removed was in fact removed,” Gratz said. “… There is not a way in which the world would be better for them in the ways that they say the world is not good for them if the CMI that they say was removed was never removed.”
What comes next
The Times’ lawsuit is just one of many lawsuits facing OpenAI. While OpenAI won a case in November, other ongoing lawsuits include complaints by a group of Canadian news publishers, a group of U.S. newspapers owned by Alden Capital, and a class action lawsuit filed by a group of authors. (OpenAI, Perplexity and Microsoft roped into the ongoing Google search antitrust lawsuit after Google sent subpoenas to all three companies.)
Other major tech startups and giants have their own legal battles related to AI and copyright. Meta faces a class action lawsuit filed by a group of writers including Sarah Silverman. Perplexity is a defendant in a lawsuit filed by News Corp in October. Google is facing a lawsuit brought against it by the Authors Guild.
It’s unclear when U.S. Judge Sidney Stein will rule on anything in the case. However, it seems like he is prepared to be “in it for the long haul,” said Megan Gray, an attorney and founder of GrayMatters Law & Policy.
“Judge Stein was engaged and curious, remarkable given his age and lack of technical sophistication,” said Gray, who listened to the hearing. “He understood the cases and positions, plus he has a tight rein over his courtroom. He doesn’t normally provide an audio line for the public and the fact that he did so here indicates that he is well familiar with the import of the case and its impact on society.”
More in Media

Media Briefing: Step by step, publishers are building toward an agent-led ad business
Agentic AI-driven media trading could wipe out a lot of the problems caused by its programmatic predecessor. Namely, ad tech middlemen.

In Graphic Detail: How AI search is changing publisher visibility
AI platforms like ChatGPT and Google AI Mode are driving more search activity. Some publishers are gaining visibility — but not traffic.

AI royalties for small and midsize publishers: collective licensing’s next big play
Don’t credit OpenAI’s ChatGPT, credit corporate LLMs – enterprise RAG is what’s creating royalty revenue for publishers.