The affordability of DeepSeek is a myth: The revolutionary AI actually cost $1.6 billion to develop
DeepSeek's new chatbot boasts an impressive introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you." This AI, a product of the Chinese startup DeepSeek, has quickly become a major market player, even contributing to a significant drop in NVIDIA's stock price. Its success stems from a unique architecture and training methodology, incorporating several innovative technologies.
Multi-token Prediction (MTP): Unlike traditional word-by-word prediction, MTP forecasts multiple words simultaneously, analyzing different sentence parts for improved accuracy and efficiency.
Mixture of Experts (MoE): This architecture utilizes multiple neural networks to process input data, accelerating AI training and enhancing performance. DeepSeek V3 employs 256 neural networks, activating eight for each token processing task.
Multi-head Latent Attention (MLA): This mechanism focuses on crucial sentence elements, repeatedly extracting key details from text fragments to minimize information loss and capture subtle nuances.
DeepSeek initially claimed to have trained its powerful DeepSeek V3 neural network for a mere $6 million using 2048 GPUs. However, SemiAnalysis revealed a far more substantial infrastructure: approximately 50,000 Nvidia Hopper GPUs, including 10,000 H800s, 10,000 H100s, and additional H20 GPUs, spread across multiple data centers. This represents a total server investment of roughly $1.6 billion, with operational expenses estimated at $944 million.
DeepSeek, a subsidiary of the Chinese hedge fund High-Flyer, owns its data centers, providing complete control over AI model optimization and faster innovation implementation. This self-funded approach enhances flexibility and decision-making speed. The company also attracts top talent, with some researchers earning over $1.3 million annually, primarily recruiting from leading Chinese universities.
While DeepSeek's initial $6 million training cost claim seems unrealistic—referring only to pre-training GPU usage and excluding other expenses—the company has invested over $500 million in AI development. Its compact structure allows for efficient innovation implementation, contrasting with the bureaucracy of larger corporations.
DeepSeek's success showcases the potential of well-funded independent AI companies to compete with industry giants. While its "revolutionary budget" claims are exaggerated, its billions in investment, technological breakthroughs, and strong team are undeniable factors in its success. The contrast is striking when considering competitor costs; DeepSeek spent $5 million on R1, while ChatGPT4 cost $100 million. Despite the significant investment, DeepSeek’s costs remain significantly lower than its competitors.