Article
AI on Trial: The Copyright Battle Between The New York Times and OpenAI
Kendal Enz
March 19, 2024
Banner

The copyright infringement lawsuit filed by The New York Times against AI startup OpenAI and its partner Microsoft late last year could significantly shape the trajectory of artificial intelligence innovation in the United States. At the center of this legal battle, initiated in the Federal District Court in Manhattan, is The New York Times' accusation that OpenAI unlawfully utilized its copyrighted content to train its AI model GPT-4, diverting readers and profits from The New York Times. Responding publicly, OpenAI articulated in a January blog post its commitment to fair use, emphasizing its partnerships with various news organizations and its efforts to address and mitigate content "regurgitation," the reproduction of content verbatim.

Joshua Walker, an intellectual property (IP) attorney and leading voice in legal AI, considers this lawsuit a significant, perhaps critical, case for AI development and regulation. "I think The New York Times case is one of the stronger complaints out there," he said, highlighting the lawsuit's focus on the direct reproduction of content, distinguishing it from other legal challenges against OpenAI. For instance, comedian Sarah Silverman's case against OpenAI hit a roadblock because it tried to claim ownership over publicly available information. "Copyright only controls how I creatively express something in a tangible medium, such as text, writing or art. It does not extend to the information itself," Walker said.

To eliminate such occurrences, OpenAI asserts that they are addressing prompt-induced content reproduction, such as articles from The New York Times that typically require a paid subscription. However, historical copyright breaches still pose legal hazards, and the dispute—or a similar dispute with analogous claims—is likely to escalate to the Supreme Court, Walker said.

In arguing for fair use, Walker suggests OpenAI might say that the societal benefits of AI innovation justify using copyrighted material for training despite potential legal complexities. According to Walker, one defense argument might run like this: "You cannot have the social good without a little hairiness around the copyright ingest that's being used to train those large language models." As of April 2023, OpenAI's ChatGPT-3 was trained on 175 billion parameters, making it one of the most powerful language models. Rumors suggest that ChatGPT-4 might leverage a trillion parameters, Walker said. He acknowledges that such vast datasets could include unauthorized data (as OpenAI may have acknowledged—for better or worse—in public comments) and that The New York Times might highlight the destructive impact of such practices on traditional content creation sectors.

According to Walker, the 2000 copyright lawsuit against Napster by the Recording Industry Association of America (RIAA) should serve as a cautionary tale for the necessity of a balanced copyright enforcement strategy in the AI sector. While the RIAA may have won the lawsuit, the music industry's failure to embrace change facilitated a surge in piracy and a significant revenue decline, mitigated only by the rise of online music platforms like YouTube, iTunes and Spotify, which heralded a new era of music consumption.

Walker argues that we must learn from past mistakes, advocating for a cooperative framework that respects copyright while promoting AI advancement. "You can crush one or two AI companies, but you're not going to stop the phenomenon of people wanting to use this data," he said. Suppose companies are not given a legal pathway to work with AI, one that balances their interests and the creators'. In that case, AI development will move offshore or cease to exist.

Addressing the potential liability of AI users, Walker said that large enterprise software companies have, and are likely to continue adopting, protective measures to shield consumers from legal repercussions—including IP indemnities. Such consumer "indemnity shields," while imperfect, allow the software enterprise to remain in the driver's seat on precedent-setting legal cases and provide a competitive advantage to them and their software solutions. On the other hand, users employing open source or otherwise un-indemnified software packages risk having to bear the lawsuits themselves.

Traditional media powerhouses like The New York Times have been in economic jeopardy for some time. Walker posits that they could significantly benefit from AI collaborations if they took a proactive approach. "If I were them…I would create my own AIs and cut my own deals because even if they bankrupt all these companies, they won't be as well off as they would be if they embraced AI," Walker said. He argues that the marriage of AI with content creation could be worth trillions. For instance, imagine a world where new written works could be produced in the style of William Faulkner. "Economically, that would be worth one hundred times more than all sales of all books he'd ever written, all films. It would dwarf everything he'd ever done," Walker said. Google's acquisition of YouTube and Steve Jobs' creation of iTunes provides a roadmap for accomplishing such deals. The threat of economic irrelevance is a powerful incentive for media companies. Conversely, the potential statutory damages for copyright infringement, multiplied across the massive number of AI training operations (at least one trillion), provide a powerful incentive for AI providers to pursue legitimate collaborations and licensing deals rather than infringing copyrights.

Advocating for a "square deal" in the spirit of Teddy Roosevelt, Walker calls for a balanced approach that respects rights while facilitating innovation. He suggests leveraging AI to streamline complaints, making copyright enforcement accessible and fair for individuals nationwide. "We need AI for regular people," Walker said, arguing against the notion that it is economically unfeasible to grant individual creators rights within AI. Technology companies and content creators must strike a deal to do this. "The biggest barrier to that isn't the law; it's hubris on both sides of the litigation abyss," Walker said.

While The New York Times and OpenAI were in negotiations before the lawsuit, mediation could still be a pragmatic decision in light of the complexities and high stakes involved in the case. Unlike litigation, mediation fosters a collaborative environment that encourages both parties to explore creative solutions beyond the win-lose scenario of a courtroom battle and is often used in cases where a loss could threaten a party's liability.

"The weaknesses in a party's position generally become more apparent as a case moves forward, and a judge's interim rulings may negatively impact one or both parties. This can increase the willingness to settle, and parties frequently engage in mediation while litigation is ongoing," said Anne Jordan, president of Jordan Associates and American Arbitration Association® panelist. "OpenAI has already settled with some media outlets, showing that despite the complexity of the issues, a negotiated resolution is an option."

Moreover, mediation could serve as a crucial mechanism for stakeholders to collaboratively set industry standards and agreements, proactively addressing future disputes. This approach not only sidesteps the extensive time and financial costs associated with drawn-out legal battles but also removes the prevailing uncertainty facing media and AI companies today.

According to Walker, the dispute between The New York Times and OpenAI transcends legal boundaries, calling for a collaborative reimagining of copyright norms in the AI age. It is an invitation to policymakers, technologists and creators to craft a future where innovation flourishes alongside respect for creative works, ensuring that the growth of AI benefits society at large.