Key Points
- OpenAI holds a 20 million‑chat sample from Dec 2022‑Nov 2024, excluding business customers.
- The data is stored under a legal hold and cannot be used beyond legal obligations.
- OpenAI offered privacy‑preserving alternatives, such as targeted searches and high‑level usage data, which the NYT rejected.
- The New York Times filed a motion demanding the full data set on a hard drive, citing a Feb 26 2026 discovery deadline.
- OpenAI says the request exceeds the original scope, which was limited to logs related to Times content.
- OpenAI will contest any attempts to make the user conversations public.
Background and Data Scope
OpenAI disclosed that the 20 million chat sample it holds spans from December 2022 to November 2024 and expressly excludes conversations from business customers. The data resides in a secure system protected by a legal hold, meaning it can be accessed only for legal obligations.
OpenAI’s Proposed Alternatives
To address the New York Times’ discovery request, OpenAI presented several privacy‑preserving options. These included targeted searches over the sample—allowing the Times to retrieve only chats that might contain its own article text—and a high‑level classification of how ChatGPT was used in the sample. OpenAI stated that the newspaper rejected these proposals.
New York Times’ Demand and Legal Context
The New York Times filed a motion on October 30 accusing OpenAI of defying prior agreements by refusing to produce even a small sample of the billions of model outputs implicated in the litigation. The filing emphasized that immediate production of the output log sample is essential to meet a discovery deadline of February 26, 2026. The Times argued that OpenAI’s suggestion to run searches on a small subset is inefficient and inadequate for expert analysis of model functions, retrieval‑augmented generation, user interaction, and hallucination frequency.
OpenAI’s Response to Expanded Request
OpenAI clarified that the Times’ discovery requests were initially limited to logs “related to Times content.” The company has been working to satisfy those requests by sampling conversation logs. Near the end of that process, the plaintiffs filed a motion demanding the entire 20‑million‑log sample to be delivered via hard drive, a request OpenAI says exceeds the original scope.
Legal Protections and Future Actions
OpenAI stressed that the chat logs are under legal hold and that the New York Times would be legally obligated not to make any data public outside the court process. The company pledged to fight any attempts to make the user conversations public, maintaining its stance on protecting user privacy while complying with legitimate legal obligations.
Source: arstechnica.com