16 new datasets in Indian languages for Artificial Intelligence and Machine Learning research
The Hindu
The Linguistic Data Consortium for Indian Languages (LDC-IL) is a Scheme of the Ministry of Education and it works on development of digital corpora in Indian languages. Housed in the Central Institute of Indian Languages (CIIL), Mysuru, the LDC-IL organised the 8th Project Advisory Committee meeting here on Monday.
The Linguistic Data Consortium for Indian Languages (LDC-IL) is a Scheme of the Ministry of Education and it works on development of digital corpora in Indian languages. Housed in the Central Institute of Indian Languages (CIIL), Mysuru, the LDC-IL organised the 8th Project Advisory Committee meeting here on Monday.
Chaired by Shailendra Mohan, director, CIIL, the meeting was attended by various domain experts and industry specialists. As an important outcome, LDC-IL launched 16 new datasets in Indian languages to help bolster quality research in Artificial Intelligence and Machine Learning.
The first of its kind, these datasets will help develop new technologies in Indian languages, including Automatic Speech Recognition, Live Voice Translation and improve the quality of the results by such tools in Indian languages, a press release from the CILL said.
The datasets cover 12 scheduled languages - Hindi, Bengali, Tamil, Marathi, Kannada, Malayalam, Odia, Assamese, Konkani, Maithili, Urdu, and Nepali. It has two variants of Indian English, namely the Bengali variant of Indian English and the Kannada variant of English.
It is noted that Indian English is internationally recognised as a language in its own right and further has its own variants within India where different mother tongues influence English to get its own flavour, with some distinct linguistic and phonetic features, the release added.
In a first, the institute also released two datasets for Chhattisgarhi, a mother tongue usually clubbed together with Hindi. “This shows the seriousness of the government to ensure that education and technology will be bolstered for all mother tongues of India as has been recommended in the NEP-2020,” the CIIL said.
These datasets will bolster research and development in all Indian languages and academia and industry both will benefit from them. The applications developed based on these datasets will finally help in promotion of these languages, according to the CIIL.
More than 2.6 lakh village and ward volunteers in Andhra Pradesh, once celebrated as the government’s grassroots champions for their crucial role in implementing welfare schemes, are now in a dilemma after learning that their tenure has not been renewed after August 2023 even though they have been paid honoraria till June 2024. Disowned by both YSRCP, which was in power when they were appointed, and the current ruling TDP, which made a poll promise to double their pay, these former volunteers are ruing the day they signed up for the role which they don’t know if even still exists