Artificial intelligence

AI deepseek

Publiée le October 19, 2025

DeepSeek: China’s open-source AI revolution

The artificial intelligence scene was turned upside down in early 2025 by the arrival of DeepSeek. In record time, this young Chinese entity, an offshoot of the High-Flyer hedge fund, released several open-source models capable of rivaling the American giants at a derisory cost. Its technological innovations and transparent approach have aroused enthusiasm among developers, but also fears among its competitors. This article takes an in-depth look at the definition of DeepSeek AI, traces its history, details its models (LLM, V2, V3, R1), compares its performance to models such as GPT-4 and examines the economic and geopolitical impact of this emergence. The information presented here is drawn from the first five research results of 2025, and surpasses existing articles in quality.

What is DeepSeek?

DeepSeek is an artificial intelligence research laboratory founded in May 2023 in Hangzhou, China. Originally the AI arm of High-Flyer, a quantitative portfolio management company, it has been transformed into an independent entity to devote itself entirely to fundamental research. The company claims a different approach to the American behemoths: it favorsopenness andalgorithmic efficiency over the pursuit of immediate profit. Its first models were published under MIT license and made freely available to developers via a website and mobile applications. DeepSeek employs around 200 people, compared with several thousand for its competitors, and benefits from the financial backing of the High-Flyer fund (around $15 billion in assets under management).

This open-source strategy has enabled DeepSeek to quickly become one of the leaders in large language models (LLMs). By January 2025, its mobile application had overtaken ChatGPT in the Apple App Store, with more than 2.6 million downloads. The company claims between 5 and 6 million users, proof of a worldwide craze.

Timeline: the rise of DeepSeek

DeepSeek’s meteoric rise is reflected in the succession of its models. Here are the main milestones:

May 2023: creation of DeepSeek, heir to High-Flyer’s AI division.
November 2023: release of DeepSeek Coder, the first open-source code generation model.
Early 2024: release of DeepSeek LLM (67 billion parameters) and start of a price war on the Chinese market.
May 2024: launch of the DeepSeek-V2 series, featuring expert mixture models (MoE) and a context length extended to 128,000 tokens. This version is trained on 8.1 trillion tokens and uses a double reinforcement cycle (RL) to improve security and relevance.
December 2024: release of DeepSeek-V3, a Mixture-of-Experts model with 671 billion parameters, activating only 37 billion per token. It introduces a lossless load-balancing strategy and a multi-token prediction objective. The team uses FP8 precision training and pre-train the model on 14.8 trillion tokens, achieving performance comparable to proprietary models while requiring only 2.788 million H800 GPU hours, or around 5.5 million USD.
January 2025: release of DeepSeek-R1-Zero and DeepSeek-R1, two models specialized in reasoning. R1-Zero is trained solely via unsupervised reinforcement learning, but suffers from repetition and language mixing. R1 corrects these shortcomings using priming data and a multi-stage pipeline incorporating several RL and fine-tuning phases. The cost of training R1 is estimated at ≈ 6 M USD.

In the space of twenty months, DeepSeek launched a complete range of models, competing with GPT-4 while adopting a very aggressive pricing policy.

Technological innovations

Mixture-of-Experts and Multi-head Latent Attention

Versions V2 and V3 are distinguished by their use of a Mixture-of-Experts (MoE) architecture. In this approach, the model is composed of dozens of neural networks (“experts”); only a subset is activated for each token, which considerably reduces computing costs. DeepSeek-V2 also uses multi-head latent attention (MLA ) to approximate classical attention with a matrix of reduced rank. These innovations make it possible to increase the context length to 128,000 tokens without exploding costs, while maintaining a high level of performance.

Multi-token prediction and FP8 training

DeepSeek-V3 innovates by eliminating the loss of auxiliary load used in other MoE architectures. Engineers are implementing a multi-token prediction system, which involves predicting several tokens at once rather than just one. This approach speeds up inference and can be used as a basis for speculative decoding. The team is also adopting a mixed FP8 training framework, validating for the first time the effectiveness of this reduced precision on such large models. Through a hardware/algorithm co-design, DeepSeek manages to superimpose communication and computation, significantly reducing pre-training costs. As a result, the V3 model is pre-trained in just 2.664 M GPU hours, then refined with a further 0.1 M hours.

Reinforcing and distilling reasoning

The DeepSeek-R1 model focuses on reasoning and problem solving. The researchers demonstrate that it is possible to incentiate the reasoning ability of an LLM solely via reinforcement learning, without going through supervised fine-tuning. The pipeline includes two reinforcement steps to discover improved reasoning patterns and align the output with human preferences, as well as two supervised fine-tuning steps serving as a starting point. Distilled models (1.5 B – 70 B parameters) are inspired by R1 behavior to produce smaller models, outperforming those trained directly in RL on small sizes.

DeepSeek’s flagship models

The table below, to be inserted as an image, summarizes the main features of the major DeepSeek versions: LLM, V2, V3 and R1.

[Insert DeepSeek version table here].

DeepSeek LLM 7B/67B

DeepSeek’s first consumer model, DeepSeek LLM comes in two sizes (7B and 67B parameters). Both models use a dense architecture with normalization, SwiGLU in feedforward and rotary positional embeddings. The vocabulary size is 102,400 words and the context length 4,096 tokens. According to the property table, model 7B has 30 layers and a dimensional vector of 4096. This model has been trained on 2 trillion English and Chinese tokens. Version 67B increases capacity to 95 layers and a dimension of 8192. These models serve as the basis for subsequent MoE versions.

DeepSeek V2

Launched in May 2024, DeepSeek-V2 applies multi-head latent attention and expert blending. Versions V2 and V2-Lite, with 236 B and 15.7 B parameters respectively, extend the context to 128,000 tokens. Training takes place on 8.1 T tokens, with a dataset comprising 12% more Chinese text than English. A two-stage reinforcement learning cycle is used: a first phase to solve mathematical and programming problems, then a second phase to improve the model’s utility and security. This approach, coupled with MoE architectures, considerably reduces operating costs.

DeepSeek V3

The most highly publicized version, DeepSeek-V3 is based on a MoE architecture with 671 billion parameters and 37 billion activated per token. The model introduces a lossless load balancing strategy and a multi-token prediction objective, improving performance without adding auxiliary loss. The team pre-trained it on 14.8 T tokens, then applied supervised fine-tuning and reinforcement to exploit its capabilities. Despite its size, the full training requires just 2.788 M hours of H800 GPU, or ≈ 5.5 M USD. Benchmarks show that V3 outperforms other open source models and comes close to proprietary models on evaluation sets such as MMLU and ARC. The cost per million output tokens is around 0.28 USD, well below competitors’ rates.

DeepSeek R1 and R1-Zero

Presented in January 2025, DeepSeek-R1-Zero and DeepSeek-R1 are reasoning models. R1-Zero is trained solely via massive, unsupervised reinforcement learning, which brings out complex reasoning behaviors but causes repetition and linguistic mix-ups. The DeepSeek-R1 model corrects these shortcomings by integrating a cold start and a multi-stage pipeline with two RL phases and two supervised fine-tuning phases. The researchers show that reasoning ability can be distilled down to smaller models: distilled versions from 1.5 B to 70 B outperform those trained directly in RL. DeepSeek-R1 achieves comparable performance to the OpenAI-o1 model on mathematical, programming and reasoning tasks, while costing around 50 times less per million tokens.

Comparison with OpenAI: cost and performance

DeepSeek models stand out for their development and operating costs. According to several analyses, DeepSeek-V3 costs USD 5.5 million to train, compared with USD 50-100 million for GPT-4. Similarly, R1 is estimated to cost USD 6 million to train, while its competitor OpenAI-o1 is said to have cost over USD 100 million. In operation, DeepSeek charges around USD 0.14 per million input tokens and USD 0.28 per million output tokens. By comparison, GPT-4o costs around 2.50 USD for 1 million input tokens and 10 USD for 1 million output tokens. This difference explains why some companies can reduce their AI costs by 98% by opting for DeepSeek.

The following table, to be inserted as an image, summarizes the main deviations:

[Insert DeepSeek vs OpenAI comparison chart here].

In addition to price, DeepSeek offers a context length of 128 K tokens, compared with 128 K for GPT-4o, but only 8 K for standard GPT-4. Its models are licensed by MIT, whereas OpenAI’s models remain proprietary. Finally, the MoE architecture activates just 37 B parameters per token, reducing the energy footprint compared with dense 405 B parameter models like GPT-4.

Economic and geopolitical impact

The arrival of the DeepSeek models had international repercussions. On January 20, 2025, the release of R1 and R1-Zero created a media frenzy; Nvidia’s market capitalization plunged 17% in one day. Some observers describe DeepSeek as an AI that is “cheaper and more efficient” than its American rivals, calling into question the technological dominance of the USA. The cost per query is said to be 27 times lower than GPT-4, and the cost of developing the R1 model around 96% lower than OpenAI-o1. Despite the US semiconductor embargo, DeepSeek managed to source H100 GPUs via alternative channels, notably in India, Taiwan and Singapore. This feat has fueled fears of a “Sputnik moment”, with some observers seeing it as a signal of a reversal in the global AI hierarchy.

In its analysis, Lux Research believes that DeepSeek has proven the commodification of large language models. The development cost of V3 (≈ 5.7 M USD) is ten times less than Llama 3 and twenty times less than GPT-4. Improvements include the compression of training data, the use of 8-bit storage and the partial activation of “experts” for each task. This efficiency is largely due to hardware constraints: the researchers used less powerful but less expensive H800 GPUs, banned from export to China. In total, V3 requires 2.78 M H800 hours, compared with 30 M H100 hours for Llama 3.1. This shows that algorithmic innovation can compensate for a hardware deficit.

Reception and controversy

Although praised for its efficiency, DeepSeek has also attracted criticism. Some rumors claim that the company has distilled Western models by exploiting responses generated by them. In particular, OpenAI suggests that DeepSeek trained its own model on GPT output. IRIS also points out that DeepSeek was able to acquire high-end GPUs before the American embargo. These suspicions raise ethical questions about intellectual property and the transparency of training data. However, DeepSeek claims to have used mainly public and open source data. Its open-source approach and publication of detailed reports (on GitHub and arXiv) contrast with the more closed practices of some of its competitors.

Future prospects and developments

DeepSeek is constantly improving its models. The company released V3.1 in March 2025, combining “reflection” and “non-reflection” modes, followed by V3.2 Exp in June 2025, with improved computational efficiency and reduced API pricing (according to official announcements). The next challenges will be to integrate multimodal capabilities (vision, audio) and enhance reliability in sensitive contexts. According to market studies, the democratization of open source models such as DeepSeek could lead to a lasting drop in AI costs, making these tools accessible to SMEs and emerging countries. In Europe, these developments also call for reflection on the rules of digital sovereignty and the importance of supporting local research to avoid dependence on American and Chinese giants.

Conclusion

DeepSeek represents a major turning point for artificial intelligence. In less than two years, this Chinese start-up has succeeded in designing massive, high-performance, open-source models, while defying the law of costs. Its innovations – expert mixture, multi-token prediction, FP8 training and reinforcement learning – demonstrate that it is possible to compete with the incumbents with more modest resources. The economic and geopolitical impact of DeepSeek is already reflected in a fall in the market capitalization of hardware suppliers and a debate on technological sovereignty. In the future, the rise of open source AI could encourage a more equitable distribution of technologies and stimulate creativity worldwide. However, questions remain about the origin of training data and competition between Western and Chinese models. In the meantime, DeepSeek stands out as the symbol of a new wave of AI: more open, more efficient and more accessible.

Autres articles

Voir tout

Découvrir

Contact

Écrivez-nous