![Researchers warn of unchecked toxicity in AI language models](https://www.ctvnews.ca/content/dam/ctvnews/en/images/2024/4/21/chatgpt-artificial-intelligence-1-6856101-1713734365766.jpg)
Researchers warn of unchecked toxicity in AI language models
CTV
As OpenAI’s ChatGPT continues to change the game for automated text generation, researchers warn that more measures are needed to avoid dangerous responses.
As OpenAI’s ChatGPT continues to change the game for automated text generation, researchers warn that more measures are needed to avoid dangerous responses.
While advanced language models such as ChatGPT could quickly write a computer program with complex code or summarize studies with cogent synopsis, experts say these text generators are also able to provide toxic information, such as how to build a bomb.
In order to prevent these potential safety issues, companies using large language models deploy safeguard measures called “red-teaming,” where teams of human testers write prompts aimed at provoking unsafe responses, in order to trace risks and train chatbots to avoid providing those types of answers.
However, according to researchers with Massachusetts Institute of Technology (MIT), “red teaming” is only effective if engineers know which provocative responses to test.
In other words, a technology that does not rely on human cognition to function still relies on human cognition to remain safe.
Researchers from Improbable AI Lab at MIT and the MIT-IBM Watson AI Lab are deploying machine learning to fix this problem, developing a “red-team language model” specifically designed to generate problematic prompts that trigger undesirable responses from tested chatbots.
"Right now, every large language model has to undergo a very lengthy period of red-teaming to ensure its safety,” said Zhang-Wei Hong, a researcher with the Improbable AI lab and lead author of a paper on this red-teaming approach, in a press release.