OpenAI: Training AI Tools Without Using Copyrighted Material Is Impossible

OpenAI has openly shared that it can’t train advanced AI models like ChatGPT without using copyrighted material.

OpenAI has been upfront about its position on the use of copyrighted material in AI training. In a document submitted to the UK’s House of Lords Communications and Digital Select Committee, OpenAI shared that training large language models such as GPT-4, the tech behind ChatGPT, would be impossible without access to copyrighted work.

“Because copyright today covers virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today’s leading AI models without using copyrighted materials,” OpenAI said in its submission. OpenAI’s position has been met with legal pushback.


Legal Action Faced by OpenAI


Despite OpenAI’s stance, it has faced several lawsuits when it comes to using copyright content:


The New York Times Lawsuit


OpenAI and Microsoft faced a lawsuit from The New York Times in December 2023. The newspaper accused the two companies of using its published news articles without permission to train their AI models. OpenAI responded to the lawsuit, stating that The Times was not telling the full story.


Authors Sue OpenAI


In a similar vein, OpenAI has also been sued by authors Nicholas Basbanes and Nicholas Gage. OpenAI was accused of using copyrighted works to train its AI models. This legal action came after The New York Times filed a similar lawsuit.


More Legal Actions in the AI Field


OpenAI isn’t the only AI company to face lawsuits over copyright issues:


Stability AI and Getty Images


Stability AI has also faced legal issues over copyright. Getty Images claimed that Stability AI used over 12 million of its photos without permission or compensation, in order to train its AI image generation tool, Stable Diffusion.


Anthropic and Music Publishers


In the music industry, AI company Anthropic was sued by music publishers Universal Music, ABKCO, and Concord Publishing. They accused Anthropic of misusing an innumerable amount of copyrighted song lyrics to train its chatbot Claude.


Midjourney and DeviantArt


Midjourney and DeviantArt were also named in a lawsuit that alleges they infringed on the rights of millions of artists by training their tools on web-scraped images.


OpenAI’s Blog Post Response


In response to The New York Times’ lawsuit, OpenAI stands firm on its commitment to collaboration with news organisations, emphasising their efforts to support the news ecosystem. OpenAI states, “We work hard in our technology design process to support news organisations,” highlighting partnerships that benefit reporters and editors by aiding in tasks like analyzing public records and translating stories.

Addressing concerns about AI model training, OpenAI asserts the fair use of publicly available internet materials and acknowledges legal support for this practice. However, the company takes a proactive step by providing an opt-out option for publishers, demonstrating a commitment to ethical considerations and respecting content creators. OpenAI states, “We have led the AI industry in providing a simple opt-out process for publishers.”

Regarding the rare issue of “regurgitation” in their models, OpenAI acknowledges the challenge and expresses its dedication to minimising such occurrences. The company underlines responsible technology use, discouraging intentional manipulation to induce regurgitation. OpenAI responds to The New York Times’ claims, suggesting that induced regurgitations seem to result from intentional manipulation of prompts.

OpenAI concludes by expressing hope for a constructive partnership with The New York Times and a continued collaboration with news organisations to enhance journalism through AI. The company remains optimistic about future partnerships while addressing the allegations raised in the lawsuit.