Why AI in healthcare needs stringent safety protocols Premium

The Hindu

Wednesday, April 9, 2025 9:23 AM GMT

AI safety in medicine is crucial to prevent catastrophic errors in patient care with the rise of large language models.

In 1982, a chilling tragedy in Chicago claimed seven lives after Tylenol (paracetamol) capsules were mixed with cyanide—not during manufacturing, but after reaching store shelves by unknown killer(s). Until the 1980s, products weren’t routinely sealed, and consumers could not know if items had been tampered with. The incident exposed a critical vulnerability and led to a sweeping reform: the introduction of tamper-evident sealed packaging. What was once optional became essential. Today, whether it’s food, medicine, or cosmetics, a sealed cover signifies safety. That simple seal, born from crisis, transformed into a universal symbol of trust.

We are once again at a similar crossroads. Large Language Models (LLM) like ChatGPT, Gemini, and Claude are advanced systems trained to generate human-like text. In the medical field, LLMs are increasingly being used to draft clinical summaries, explain diagnoses in simple language, generate patient instructions, and even help in decision-making processes. A recent survey found that over 65% of healthcare professionals have used LLMs, and more than half do so weekly for administrative relief or clinical insight in the United States. This integration is quick and often unregulated, especially in private settings. The success of these systems depends on the propriety Artificial Intelligence (AI) models built by companies, and the quality of training data.

To put it simply, an LLM is an advanced computer programme that generates text based on patterns it has learned. It is trained using a training dataset—vast text collections from books, articles, web pages, and medical databases. These texts are broken into tokens (words or word parts), which the model digests to predict the most likely next word in a sentence. The model weights—numbers encode this learning—are adjusted during training and stored as part of the AI’s core structure. When someone queries the LLM—whether a patient asking for drug side effects or a doctor seeking help with a rare disease—the model draws from its trained knowledge and formulates a response. The model performs well if the training data is accurate and balanced.

Training datasets are the raw material on which LLMs are built. Some of the most widely used biomedical and general training datasets include The Pile, PubMed Central, Open Web Text, C4, Refined Web, and Slim Pajama. These contain moderated content (like academic journals and books) and unmoderated content (like web pages, GitHub posts, and online forums).

A recent study in Nature Medicine published online in January 2025, explored a deeply concerning threat: data poisoning. Unlike hacking into an AI model that requires expertise, this study intentionally created a poisonous training dataset using the OpenAI GPT-3.5-turbo API. It generated fake but convincing medical articles containing misinformation—such as anti-vaccine content or incorrect drug indications at a cost of around $1,000. The study investigated what happened if the training dataset was poisoned with misinformation. Only a tiny fraction, 0.001% (1 million per billion) of the data was misinformed. However the results revealed that it displayed a staggering 4.8% to 20% increase in medically harmful responses, depending on the size and complexity of the model (ranging from 1.3 to 4 billion parameters) during prompts.

Benchmarks are test sets that check if an AI model can answer questions correctly. In medicine, these include datasets like PubMedQA, MedQA, and MMLU, which draw on standardised exams and clinical prompts based on multiple-choice style evaluations. If a model performs well on these, it is assumed to be “safe” for deployment. They are widely used to claim LLMs perform at or above the human level. But, the Nature study revealed that poisoned models scored as well as uncorrupted ones. This means existing benchmarks may not be sensitive enough to detect underlying harm, revealing a critical blind spot about benchmarks.

LLMs are trained on billions of documents, and expecting human reviewers—such as physicians—to screen through each and every one of these is unrealistic. Automated quality filters are available to eliminate garbage content containing abusive language or sexual content. But these filters often miss syntactically elegant, misleading information—the kind a skilled propagandist or AI can produce. For example, a medically incorrect statement written in polished academic prose will likely bypass these filters entirely.

Read full story on The Hindu

Share this story on:-

Primary Country (Mandatory)

Other Country (Optional)

Set News Language for United States

Set News Language for World

Set News Source for United States

Set News Source for World

Why AI in healthcare needs stringent safety protocols Premium

The Hindu

50 years since the launch of Aryabhata

Abu Jani Sandeep Khosla’s new store in Mumbai is a theatrical ode to Indian craftsmanship

Indian astronaut Shubhanshu Shukla set for space travel in May

How well do you know Kerala’s native mangoes?

Is the once-extinct dire wolf back? | Explained

Environment Ministry must roll back order on desulphurising coal plants: study

Colossal squid caught on camera for first time in deep sea

How your immune system reacts through anaphylactic shocks to protect you from the “invaders”

Scientists find strongest evidence yet of life on an alien planet

Greenland’s melting ice, unstable fjords vex extraction of oil and minerals Trump covets

The men who first split the atom

Zoo in Spain helps elderly elephants age gracefully

How will genetic mapping of Indians help? | Explained Premium

Meet Shashaank Saravanakumar, a young para cyclist breaking barriers in Coimbatore

If it can weather some challenges, AI can supercharge forecasting Premium

Surveillance, R&D innovation and communication are key levers for India to lead the fight against AMR Premium

The highs and lows of summiting Mount Kilimanjaro in eight days

Indians battle respiratory issues, skin rashes in world’s most polluted town

Tiny pacemaker ups the ante on device’s abilities Premium

Retreat like the royals at Fort Rajwada in Jaisalmer

New Škoda Kodiaq Review: Full-size Family SUV

Chef Gautam Krishankutty’s upcoming pop-up spotlights his childhood favourites of mutta roast and mutton biryani

Scientists find green way to recycle toxic perovskite solar cells Premium

‘Travel’ on these mini trains

Does the planet K2-18b show signs of life? | Explained Premium