OpenAI accidentally removed potential evidence in NY Times copyright case (updated)

Salma November 23, 2024

0 0 2 minutes read

OpenAI accidentally removed potential evidence in NY Times copyright case (updated)

Lawyers for The New York Times and the Daily News, which sued OpenAI for allegedly undermining their jobs to train AI models without permission, said OpenAI developers accidentally removed data that could be relevant to the case.

Earlier this fall, OpenAI agreed to provide two virtual machines so that Times and Daily News consultants could search for their copyrighted content in their AI training sets. (Virtual machines are software-based computers that exist within another computer’s operating system, typically used for testing purposes, backing up data, and running applications.) In the letter, lawyers for the publishers say they and the experts they hired spent money. over 150 hours since November 1st searching OpenAI training data.

But on November 14, OpenAI developers deleted all publishers’ search data stored on one of the virtual machines, according to the aforementioned letter, filed in the US District Court for the Southern District of New York on Wednesday.

OpenAI tried to recover the data – and was very successful. However, because the folder structure and file names were “irretrievably lost”, the data obtained “cannot be used to determine where the plaintiffs’ copied articles were used to create.” [OpenAI’s] models,” according to the book.

“News plaintiffs have been forced to restructure their operations from scratch using valuable man-hours and computer processing time,” wrote an attorney for The Times and Daily News. “The media plaintiffs learned yesterday that the information obtained is unusable and that their week-long work of experts and lawyers must be redone, which is why this supplemental letter is being filed today.”

Counsel for the plaintiffs clearly states that they have no reason to believe that the removal was intentional. But they say the incident underscores that OpenAI is “in a much better position to search its data sets” for potentially infringing content using its tools.

An OpenAI spokesperson declined to provide a statement.

But late on Friday, November 22, OpenAI’s counsel filed a response to a letter sent by lawyers to The Times and Daily News on Wednesday. In their response, OpenAI’s lawyers categorically denied that OpenAI removed any evidence, and instead suggested that the plaintiffs were responsible for the poor maintenance of the system that led to the technical problem.

“Plaintiffs requested a configuration change to one of several machines provided by OpenAI to search training data sets,” OpenAI’s counsel wrote. “Using the change requested by the plaintiffs, however, led to the removal of the folder structure and file names on one hard drive – a drive that was supposed to be used as a temporary storage … In any case, there is no reason to think that any files. they are actually lost.”

In this case and others, OpenAI has maintained that training models using publicly available data — including articles from the Times and Daily News — is a fair use. In other words, in creating models like GPT-4o, which “reads” from billions of examples of e-books, essays, and so on to produce human-sounding text, OpenAI believes it is not required to license or otherwise pay. models – even if it makes money on those models.

That said, OpenAI has ink licensing agreements with a growing number of new publishers, including the Associated Press, Business Insider owner Axel Springer, the Financial Times, People’s parent company Dotdash Meredith, and News Corp . OpenAI declined to make terms of this. operates publicly, but one content partner, Dotdash, is reportedly paid at least $16 million a year.

OpenAI has neither confirmed nor denied that it has trained its AI systems on any copyrighted works without permission.

Update: Added OpenAI’s response to these allegations.

Source link

Salma November 23, 2024

0 0 2 minutes read