BEIJING (Diya TV) — DeepSeek, a leading Chinese AI research lab under High-Flyer, has unveiled its latest large language model, DeepSeek-V3, marking a significant advancement in open-source AI technology.
Released on December 26, DeepSeek-V3 boasts 671 billion parameters, with 37 billion activated per token, utilizing a Mixture-of-Experts architecture.
This design enables the model to deliver enhanced performance while maintaining computational efficiency. Notably, the model was trained over two months at a cost of approximately $5.58 million, significantly less than its counterparts, due to optimized training methodologies.
Benchmark tests indicate that DeepSeek-V3 matches the capabilities of models like GPT-4o and Claude 3.5 Sonnet, and surpasses others such as Llama 3.1 and Qwen 2.5.
The model processes 60 tokens per second, tripling the speed of its predecessor, DeepSeek-V2. Developers can access DeepSeek-V3 under a permissive open-source license, facilitating integration into various applications.
This development underscores China’s rapid progress in AI research, often achieving results with fewer resources compared to Western counterparts. Despite facing challenges such as access to advanced chips due to export controls, Chinese AI labs have employed innovative strategies to advance their models.