Background of the lawsuit
Several authors filed an expanded class‑action suit claiming NVIDIA trained its AI models on millions of pirated books. The complaint was amended to target newer models and datasets, prompting fresh discovery requests from the plaintiffs.
NVIDIA's core arguments for dismissal
The chip maker contends that the plaintiffs have not demonstrated that their specific books were actually used in training. Key points include:
- Contacting Anna's Archive does not equate to infringement.
- Speculation that a large dataset “must have” contained the works is insufficient.
- There is no evidence NVIDIA knowingly used infringing material.
Claims NVIDIA challenges
NVIDIA seeks to dismiss virtually every new claim in the amended complaint, including:
- Contributory copyright infringement – no proof of knowledge or material contribution.
- Vicarious copyright infringement – no evidence of specific pirated books.
The company emphasizes that its NeMo framework offers optional tools that customers can apply to any dataset, licensed or public‑domain.
The direct infringement claim
The only claim not covered by the current motion is the direct infringement allegation that NVIDIA used the Books3 database to train its NeMo model. NVIDIA plans to address this claim at trial or via summary judgment, likely relying on a robust fair‑use defense.
Implications for AI and copyright law
This case highlights the growing legal uncertainty surrounding AI training data. A dismissal could set a precedent that mere speculation about dataset composition is insufficient for copyright liability, while a successful direct‑infringement defense could reinforce fair‑use arguments for AI developers.