Researchers warn we could run out of data to train AI by 2026. What then? Premium

The Hindu

Thursday, November 09, 2023 09:39:51 AM UTC

Researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems.

As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems. This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.

But why is a potential lack of data an issue, considering how much there are on the web? And is there a way to address the risk?

We need a lot of data to train powerful, accurate and high-quality AI algorithms. For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.

Similarly, the stable diffusion algorithm (which is behind many AI image-generating apps such as DALL-E, Lensa and Midjourney) was trained on the LIAON-5B dataset comprising of 5.8 billion image-text pairs. If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.

The quality of the training data is also important. Low-quality data such as social media posts or blurry photographs are easy to source, but aren’t sufficient to train high-performing AI models.

Text taken from social media platforms might be biased or prejudiced, or may include disinformation or illegal content which could be replicated by the model. For example, when Microsoft tried to train its AI bot using Twitter content, it learned to produce racist and misogynistic outputs.

This is why AI developers seek out high-quality content such as text from books, online articles, scientific papers, Wikipedia, and certain filtered web content. The Google Assistant was trained on 11,000 romance novels taken from self-publishing site Smashwords to make it more conversational.

Read full story on The Hindu

Share this story on:-

Primary Country (Mandatory)

Other Country (Optional)

Set News Language for United States

Set News Language for World

Set News Source for United States

Set News Source for World

Researchers warn we could run out of data to train AI by 2026. What then? Premium

The Hindu

Cocoa farmers staring at a crisis in A.P. as price plummets

Court rejects Ranya Rao’s bail plea, Gaurav Gupta Committee begins probe

Hackathon for solutions to tackle human-animal conflict begins in Coimbatore

Sustainable streets

Coimbatore Corporation to convert medians into green spaces

None of the 1,275 gazetted posts filled in Ladakh in the past six years, govt. informs parliamentary panel

Portable garden a hit with farming enthusiasts

Safety is the cornerstone of rock blasting operations: ADGP

Keeriparai to host Kanniyakumari’s first biodiversity conservation centre and museum

CM will make Tirupati a sports hub: SAAP chairman

Union Law Minister felicitates Tavil maestro

SSLC examination to be closely monitored by control room in DC office in Mandya

Delegation from Maharashtra, MP visit FLO Industrial Park near Hyderabad

Collector distributes welfare assistance at Women’s Day event

Sanjay Shirsat hints at ‘earthquake’ in NCP(SP), says Jayant Patil to leave party for Ajit’s NCP

Writer-critic Panchakshari Hiremath passes away

East Coast Road to get elevated corridor from Thiruvanmiyur to Uthandi

Wagamon gears up for international paragliding festival

DKS hints at water tariff hike by one paise per litre

Demolition of Tagore Hall begins

At 35.4°C, Bengaluru records hottest day of year so far

The case of forgotten ‘Middles’ in newspapers

BJP rival camps vie with each other to woo Lingayats

‘Appu’ re-release: Theatres witness festive atmosphere as fans cheer for Puneeth Rajkumar

Hyderabad Holi 2025: Colours, beats & treats fill the streets