13 years later, Hindi auto-captioning launched for YouTube
The Hindu
YouTube rolls out Hindi auto-captioning for hearing impaired viewers, expanding language data availability in India.
YouTube has started rolling out automatic captions for Hindi videos, a much delayed expansion of its speech recognition-aided subtitles since the feature was launched in 2010. The automated subtitles could open up millions of Hindi language videos to viewers who are hearing impaired.
Hindi subtitles have been available on the platform on videos where creators have specifically chosen to add them; but YouTube hasn’t offered a convenient way to automatically caption Hindi videos. Since creators on YouTube have to pay for professionally created and timed subtitles, many do not commission them.
It is unclear when precisely Hindi captioning started becoming available. Transcription of Hindi has been available on Google Translate and other products by the search giant. But the inclusion of Hindi auto-captioning as a widely available feature for Hindi videos is a signal that enough data has now been gathered and processed on Hindi speech that Google feels it can offer enough accuracy on most videos in the language. By extension, that means that language data availability on Indian languages is expanding.
Well before the generative Artificial Intelligence boom, firms like YouTube have been using voice recognition for accessibility purposes. But that’s easier said than done for languages that are not heavily represented online. “In the speech to text problem, you need a lot of speech in Hindi, and a corresponding correct transcript, which is fed to [AI] models that learn by looking at this data,” Mayuresh Nirhali, a senior executive at Reverie, which works on solving problems related to Indian languages on the Internet, said.
Developing AI-enabled services like speech recognition for Indian languages is particularly difficult due to several foundational challenges, including inconsistent encoding of text online, as well as regional variations in spelling and pronunciation, Mr. Nirhali said. Now that more data appears to be available — at least to big tech firms — the situation is improving. A YouTube spokesperson did not respond to queries on the launch of auto-captioning in Hindi.
Mimicking the style of closed captioning for television viewers in countries like the United States, where it is mandatory for the small screen, YouTube’s captions show up as blocks of words as and when they are spoken, with little punctuation. While captions for news broadcasts are generally created in real time for professional TV channels, AI-enabled speech recognition allows automatic captioning to be timed more precisely, allowing viewers to pick up pauses and other cues of speech.
But accuracy and quality issues linger. Even in auto-generated English captions, for which YouTube has been perfecting its technology for over a decade, mistakes are common, and many words are often mistranscribed. Hindi captions are no different, The Hindu found in some videos. Many lines that are not articulated by speakers, even in single-speaker contexts like stand-up comedy videos, are simply omitted, while other words are transcribed by similar-sounding words.
“Writing, in general, is a very solitary process,” says Yauvanika Chopra, Associate Director at The New India Foundation (NIF), which, earlier this year, announced the 12th edition of its NIF Book Fellowships for research and scholarship about Indian history after Independence. While authors, in general, are built for it, it can still get very lonely, says Chopra, pointing out that the fellowship’s community support is as valuable as the monetary benefits it offers. “There is a solid community of NIF fellows, trustees, language experts, jury members, all of whom are incredibly competent,” she says. “They really help make authors feel supported from manuscript to publication, so you never feel like you’re struggling through isolation.”
Several principals of government and private schools in Delhi on Tuesday said the Directorate of Education (DoE) circular from a day earlier, directing schools to conduct classes in ‘hybrid’ mode, had caused confusion regarding day-to-day operations as they did not know how many students would return to school from Wednesday and how would teachers instruct in two modes — online and in person — at once. The DoE circular on Monday had also stated that the option to “exercise online mode of education, wherever available, shall vest with the students and their guardians”. Several schoolteachers also expressed confusion regarding the DoE order. A government schoolteacher said he was unsure of how to cope with the resumption of physical classes, given that the order directing government offices to ensure that 50% of the employees work from home is still in place. On Monday, the Commission for Air Quality Management in the National Capital Region and Adjoining Areas (CAQM) had, on the orders of the Supreme Court, directed schools in Delhi-NCR to shift classes to the hybrid mode, following which the DoE had issued the circular. The court had urged the Centre’s pollution watchdog to consider restarting physical classes due to many students missing out on the mid-day meals and lacking the necessary means to attend classes online. The CAQM had, on November 20, asked schools in Delhi-NCR to shift to the online mode of teaching.