Former OpenAI Researcher Says the Company Broke Copyright Law
The New York Times
Suchir Balaji helped gather and organize the enormous amounts of internet data used to train the startup’s ChatGPT chatbot.
Suchir Balaji spent nearly four years as an artificial intelligence researcher at OpenAI. Among other projects, he helped gather and organize the enormous amounts of internet data the company used to build its online chatbot, ChatGPT.
At the time, he did not carefully consider whether the company had a legal right to build its products in this way. He assumed the San Francisco start-up was free to use any internet data, whether it was copyrighted or not.
But after the release of ChatGPT in late 2022, he thought harder about what the company was doing. He came to the conclusion that OpenAI’s use of copyrighted data violated the law and that technologies like ChatGPT were damaging the internet.
In August, he left OpenAI because he no longer wanted to contribute to technologies that he believed would bring society more harm than benefit.
“If you believe what I believe, you have to just leave the company,” he said during a recent series of interviews with The New York Times.
Mr. Balaji, 25, who has not taken a new job and is working on what he calls “personal projects,” is among the first employees to leave a major A.I. company and speak out publicly against the way these companies have used copyrighted data to create their technologies. A former vice president at the London start-up Stability AI, which specializes in image- and audio-generating technologies, has made similar arguments.