Has China achieved AI breakthrough with DeepSeek? Premium
The Hindu
What makes China’s new DeepSeek AI model so disruptive and how have Chinese large language models evolved in the AI race.
For over two years, San Francisco-based OpenAI has dominated artificial intelligence (AI) with its generative pre-trained language models. The startup’s chatbot penned poems, wrote long-format stories, found bugs in code, and helped search the Internet (albeit with a cut off date). Its ability to generate coherent sentences flawlessly baffled users around the world.
Far away, across the Pacific Ocean, in Beijing, China made its first attempt to counter America’s dominance in AI. In March 2023, Baidu received the government’s approval to launch its AI chatbot, Ernie bot. Ernie was touted as the China’s answer to ChatGPT after the bot received over 30 million user sign-ups within a day of its launch.
But the initial euphoria around Ernie gradually ebbed as the bot fumbled and dodged questions about China’s President Xi Jinping, the Tiananmen Square crackdown and the human rights violation against the Uyghur Muslims. In response to questions on these topics, the bot replied: “Let’s talk about something else.”
As the hype around Ernie met the reality of Chinese censorship, several experts pointed out that difficulty of building large language models (LLMs) in the communist country. Google’s former CEO and chairman, Eric Schmidt, in talk at the Harvard Kennedy School of Government, in October 2023, said: “They [China] were late to the party. They didn’t get to this [LLM] AI space early enough.” Mr. Schmidt further pointed out that lack of training data on language and China’s unfamiliarity with open-source ideas may make the Chinese fall behind in global AI race.
As these Chinese tech giants trailed, the U.S. tech giants marched forward with their advances in LLMs. Microsoft-backed OpenAI cultivated a new crop of reasoning chatbots with its ‘O’ series that were better than ChatGPT. These AI models were the first to introduce inference-time scaling, which refers to how an AI model handles increasing amounts of data when it is giving answers.
While the Chinese tech giants languished, a Zhejiang-based hedge fund, High-Flyer, that used AI for trading, set up its own AI lab, DeepSeek, in April 2024. Within a year, the AI spin off developed the DeepSeek-v2 model that performed well on several benchmarks and was able to provide the service at a significantly lower cost than other Chinese LLMs.
When DeepSeek-v3 was launched in December, it stunned AI companies. The Mixture-of-Expert (MoE) model was pre-trained on 14.8 trillion tokens with 671 billion total parameters of which 37 billion are activated for each token.