NYT’s copyright lawsuit against OpenAI and Microsoft | Explained Premium
The Hindu
The Hindu explains the New York Times’s lawsuit against OpenAI and Microsoft, and why it’s being seen as a ‘watershed moment for AI and copyright’.
The story so far: If, in response to a prompt, ChatGPT produces text that is near-verbatim from a New York Times article, is that plagiarism? Does it amount to “theft” if OpenAI and Microsoft rake in billions of dollars using creative reporting and journalism, without offering fair compensation? Battlelinesover generative AI’s use of copyrighted work have been drawn again, this time by The New York Times. On December 28, the news platform filed a lawsuit against OpenAI and Microsoft, creators of ChatGPT and other generative AI content, for unlawful use of its work. “There is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it,” their complaint reads.
The complaint is the first AI copyright lawsuit within the news ecosystem, arguing that the generative AI models threaten the publication’s business model and compromise the credibility of its “massive investment in its journalism.” Authors, visual artists and composers have previously hit the two companies with copyright class action lawsuits alleging “rampant theft.”
The Hindu speaks to Cecilia Ziniti, a California-based tech lawyer with a specialisation in AI and business, to decode why NYT’s lawsuit is the “best case yet alleging that generative AI is copyright infringement,” and why it could be a “watershed moment for AI and copyright”.
In a 70-page complaint filed in a Manhattan federal court, The Times has alleged that OpenAI is engaging in forms of unauthorised use of copyrighted material, and making “money off the publication’s work and name,” explains Ms. Ziniti.
Sample the text below. This is an excerpt from The Times’ Pulitzer Prize-winning 2019 series on exploitative lending in New York City’s taxi industry. With “minimal prompting,” ChatGPT recited the text as quoted above, its contributions marked in black. Switch some words (“medallions” trumps “cabs,” “key initiatives” over “priorities”), add a word, and remove six others. I
This is called “memorisation,” where models regurgitate portions of the material they were trained on. The lawsuit, in Exhibit J, presents 100 examples of ChatGPT producing verbatim articles. ChatGPT is not merely scraping data from NYT articles or matching its voice, but generating “output that recites Times content verbatim, closely summarizes it, and mimics its expressive style,” The Times has alleged.
OpenAI and Microsoft use NYT’s copies to train their large language models (LLMs), including ChatGPT and Copilot, and encode its copyrighted material for the LLMs to learn from. Moreover, AI firms are reproducing articles by passing paywalls using a browsing plugin [in August, NYT and other media houses blocked OpenAI’s web crawler]. The lawsuit estimates the companies owe the claimants “billions of dollars in statutory and actual damages.” OpenAI projects $1 billion in revenue this year, making ChatGPT a “certified cash cow,” as an article put it.