Sarvam AI launches Sarvam 1 for Indic languages
The Hindu
Sarvam AI has launched Sarvam 1, a new AI model which was developed from scratch in India geared towards Indic languages.
Bengaluru-based startup Sarvam AI has launched a new large language model, Sarvam 1. The AI model is open-source and has been trained on 11 languages including Bengali, Gujarati, Hindi, Marathi, Malayalam, Kannada, Oriya, Tamil, Telugu, Punjabi and English.
The 2-billion-parameter model was trained on 4 trillion tokens on a custom tokeniser curated by Sarvam on Nvidia H100 Tensor Core GPUs. The company claims that the tokeniser is up to four times more efficient than other AI models which were trained on Indian languages.
The custom training corpus, Sarvam-2T, comprised of 20% datasets in Hindi, English, and programming languages so the AI model can perform multilingual tasks.
To deal with the lack of high-quality training data for Indian languages, Sarvam AI built datasets using synthetic data generation methods.
Besides Nvidia, the AI model also used Yotta’s data centres and AI4Bharat’s technology and language resources.
“The Sarvam 1 model is the first example of an LLM trained from scratch with data, research, and compute being fully in India”, said Dr. Pratyush Kumar, Co-Founder, Sarvam. He added; “We expect it to power a range of use cases including voice and messaging agents. This is the beginning of our mission to build full stack sovereign AI. We are deeply excited to be working together with NVIDIA towards this mission.”
Developers can use the base model, which is available on Hugging Face, to build their own AI applications for Indic language speakers.
Udhayanidhi Stalin urges cadres to launch campaign for securing 200 seats in 2026 Assembly elections
Udhayanidhi Stalin urges DMK members to gear up for 2026 Tamil Nadu elections, aiming for 200 seats.