Copyright expert predicts result of NY Times lawsuit against Microsoft, OpenAI

TheStreet spoke with a copyright expert to break down the Times' recent case against ChatGPT-maker OpenAI, and its odds of success.

Jan 7, 2024 - 19:30

0 9

Copyright expert predicts result of NY Times lawsuit against Microsoft, OpenAI

The New York Times (NYT) - Get Free Report last week became the latest in a growing list of publishers, authors and artists to file suit against the company behind artificial intelligence phenom ChatGPT: OpenAI, and its top investor, Microsoft (MSFT) - Get Free Report.

Similar to other lawsuits brought by the Authors Guild and individual writers, the Times' suit alleges rampant copyright infringement, both in the input and output of OpenAI's generative AI products.

The media giant claimed that defendants' large language models (LLMs) are built — in part — on stolen Times articles, and further, that the output of these models can provide verbatim copies of these stolen articles, a further violation of copyright law that serves the additional purpose of presenting ChatGPT and its peers as a competitor to the Times.

The complaint includes copious examples of such copyright-infringing output, none of which credited, linked back to or compensated the Times, a notable difference between search engines like Google. The suit additionally noted the company's concern over the reputational danger of misattribution in inaccurately generated output.

The Times is seeking to hold both corporations accountable for "billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times’s uniquely valuable works."

The Times is also seeking the destruction of any models trained on its content.

"If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission," a Times spokesperson said in a statement. "They have not done so."

The heart of the matter

The core impasse between the Times — which had been in talks with OpenAI since April to address these concerns — and OpenAI involves different attitudes toward the question of "Fair Use," a component of copyright law that allows for the limited use of otherwise copyrighted content.

The tech companies — whose models are built on content and data scraped from every corner of the internet — have regularly argued that if something is publicly available, it's fair to use for the training of their models.

The creators of said content, based on the commerciality of these models, largely disagree.

OpenAI, according to Bloomberg, is currently in talks to raise funds at a valuation of $100 billion, above its current valuation of $86 billion. Microsoft, with a market cap approaching $3 trillion, has invested $13 billion in OpenAI.

A subscription to ChatGPT Plus, which gives users access to a more powerful version of the chatbot, costs users $20 per month. The company also sells its service to enterprise clients. According to The Information, OpenAI recently topped $1.6 billion in annualized revenue, largely from these subscription services.

A subscription for Microsoft's AI-powered copilot costs users $30 per month. A subscription to the Times costs $25 a month.

"I think this is the big, moral view difference between the two sides here," copyright expert and Cornell professor of digital and information law James Grimmelmann told TheStreet.

"The AI companies are working in a mental space where putting things into technology blenders is always okay," he said. "The media companies have never fully accepted that. They've always taken the view that 'if you're training or doing something with our works that generates value we should be entitled to part of it.'"

Generative AI, he said, could be the technology that forces this issue to be legally addressed, unlike the Google Books case of 2015, because "you do have the possibility of outputs that are meaningfully based upon input work."

Expert: A look at the Times' case

The Times' complaint, according to Grimmelmann, is "very professionally done."

"This is definitely the most thorough and careful lawsuit I've seen in the generative AI space so far," he added.

One reason for this is the Times' lawyers were careful with documentation, according to Grimmelmann. They have records of copyright registrations, they have detailed, extensive evidence that the AI models were trained on Times articles.

And they were careful, yet creative, with the legal theories they're bringing: several evidenced variations of copyright infringement, a strong trademark claim and a hot news — or time-sensitive competition — claim, something Grimmelmann called the weakest pillar of the Times' case.

Overall, he said, the case is a strong one.

"They have clear documentation; they have copyrightable works. They have clear receipts showing their works were copied by the defendants and are present in the models that are being used," Grimmelmann said. "They have clear explanations of the economic value to the defendants. They have clear documentation of substantial similarity."

"And they have a plausible enough underlying story about the economic harms to them and the effect on their market for news," he added.

Grimmelmann expects the case to get past a possible motion to dismiss, adding that it is likely the case will go to full discovery and possibly even a trial.

The only significant pothole in the case, he said, is that many of the Times' given examples of copyright-infringing output require very carefully crafted prompts.

Expert: How the case could land

While this specific complaint is a strong one, Grimmelmann thinks the question of generative AI and copyright needs to be resolved across each form of media. The Times' case is a high-quality one around journalistic materials, he said.

The Author's Guild case is a strong one for book authors, he said. Getty's lawsuit against image generator Stability AI is a strong one for artists, photographers and copyrighted images.

The distinguishing factor that highlights these three groups, Grimmelmann said, is simple: He thinks they are all willing to negotiate.

"They want to be cut in on this. They want an arrangement where, if their works are being used for this valuable training, they get a cut of the royalties," he said. "They have enough copyrights that they have something to offer. And they're also professionals. They do media deals as part of how they stay in business."

Grimmelmann thinks that the Times' request for the destruction of models is not its end game. The request, he said, is just there to give them more leverage in future potential negotiations.

If this case does go to trial, he said that there is enough room in the Fair Use doctrine for a "policy judgment to come back in favor of either side here. A lot of this is about persuading the courts of your vision of what generative AI looks like."

Still, Grimmelmann doesn't expect the courts will rein in the tech companies. What he does expect is voluntary negotiations between the two sides featuring licensing deals or royalty obligations that will result in the dismissal of these higher-profile cases.

Some of this has already begun — OpenAI has in recent months signed licensing deals with the Associated Press and Axel Springer, the publisher behind Business Insider, whose terms remain undisclosed.

The Information reported Thursday that OpenAI has offered some media firms between $1 million and $5 million per year to license their articles for the training of its models. The Times — a publicly traded company — reported $2.3 billion in revenue in 2022.

Apple (AAPL) - Get Free Report, according to The Information, is also working to ink licensing deals with publishers; the tech giant has reportedly offered more lucrative deals than OpenAI, though is seeking less limited access to the content in question.

Ed Newton-Rex, a technologist and composer who recently resigned his position leading Stability AI's audio team, citing a disagreement with the company over "fair use," said in a post Thursday that licensing fees don't seem to be the best model going forward.

A better model, he said, is revenue sharing, which would properly align incentives and protect against a quick-growing, fast-changing industry.

Contact Ian with AI stories via email, [email protected], or Signal 732-804-1223.

Get exclusive access to portfolio managers’ stock picks and proven investing strategies with Real Money Pro. Get started now.

Original Post