Related papers: DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models

DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models

URL: http://arxiv.org/abs/2507.09955v1
Date: Mon, 14 Jul 2025 06:10:30 GMT
Title: DeepSeek: Paradigm Shifts and Technical Evolution in Large AI Models
Authors: Luolin Xiong, Haofen Wang, Xi Chen, Lu Sheng, Yun Xiong, Jingping Liu, Yanghua Xiao, Huajun Chen, Qing-Long Han, Yang Tang,
Abstract summary: DeepSeek has released their V3 and R1 series models, which attracted global attention due to their low cost, high performance, and open-source advantages.<n>The paper highlights novel algorithms introduced by DeepSeek, including Multi-head Latent Attention (MLA), Mixture-of-Experts (MoE), Multi-Token Prediction (MTP), and Group Relative Policy Optimization (GRPO)
Score: 73.99173041896884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: DeepSeek, a Chinese Artificial Intelligence (AI) startup, has released their V3 and R1 series models, which attracted global attention due to their low cost, high performance, and open-source advantages. This paper begins by reviewing the evolution of large AI models focusing on paradigm shifts, the mainstream Large Language Model (LLM) paradigm, and the DeepSeek paradigm. Subsequently, the paper highlights novel algorithms introduced by DeepSeek, including Multi-head Latent Attention (MLA), Mixture-of-Experts (MoE), Multi-Token Prediction (MTP), and Group Relative Policy Optimization (GRPO). The paper then explores DeepSeek engineering breakthroughs in LLM scaling, training, inference, and system-level optimization architecture. Moreover, the impact of DeepSeek models on the competitive AI landscape is analyzed, comparing them to mainstream LLMs across various fields. Finally, the paper reflects on the insights gained from DeepSeek innovations and discusses future trends in the technical and engineering development of large AI models, particularly in data, training, and reasoning.

Related papers

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models [78.08374249341514]
The rapid development of AI-generated content (AIGC) has led to the misuse of AI-generated images (AIGI) in spreading misinformation.<n>We introduce a large-scale and comprehensive dataset, Holmes-Set, which includes an instruction-tuning dataset with explanations on whether images are AI-generated.<n>Our work introduces an efficient data annotation method called the Multi-Expert Jury, enhancing data generation through structured MLLM explanations and quality control.<n>In addition, we propose Holmes Pipeline, a meticulously designed three-stage training framework comprising visual expert pre-training, supervised fine-tuning, and direct preference optimization
arXiv Detail & Related papers (2025-07-03T14:26:31Z)
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models [8.03446809073899]
The rapid advancement of artificial intelligence (AI) has reshaped the field of natural language processing (NLP), with models like OpenAI ChatGPT and DeepSeek AI.<n>This paper presents a detailed analysis of the evolution from ChatGPT to DeepSeek AI, highlighting their technical differences, practical applications, and broader implications for AI development.
arXiv Detail & Related papers (2025-04-04T07:08:29Z)
A Review of DeepSeek Models' Key Innovative Techniques [10.977907906989342]
DeepSeek-V3 and DeepSeek-R1 are leading open-source Large Language Models.<n>We review the core techniques driving the remarkable effectiveness and efficiency of these models.
arXiv Detail & Related papers (2025-03-14T15:11:29Z)
Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future [10.264208559276927]
This review aims to gain key insights into the development of MXAI methods.<n>We categorize MXAI methods across four eras: traditional machine learning, deep learning, discriminative foundation models, and generative LLMs.<n>We also review evaluation metrics and datasets used in MXAI research, concluding with a discussion of future challenges and directions.
arXiv Detail & Related papers (2024-12-18T17:06:21Z)
A Survey on Self-Evolution of Large Language Models [116.54238664264928]
Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. To address this issue, self-evolution approaches that enable LLMs to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing.
arXiv Detail & Related papers (2024-04-22T17:43:23Z)
A Review of Multi-Modal Large Language and Vision Models [1.9685736810241874]
Large Language Models (LLMs) have emerged as a focal point of research and application. Recently, LLMs have been extended into multi-modal large language models (MM-LLMs) This paper provides an extensive review of the current state of those LLMs with multi-modal capabilities as well as the very recent MM-LLMs.
arXiv Detail & Related papers (2024-03-28T15:53:45Z)
When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges [50.280704114978384]
Pre-trained large language models (LLMs) exhibit powerful capabilities for generating natural text.<n> Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems.
arXiv Detail & Related papers (2024-01-19T05:58:30Z)
Unleashing the potential of prompt engineering for large language models [1.6006550105523192]
Review explores the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs)<n>Examines both foundational and advanced methodologies of prompt engineering, including techniques such as self-consistency, chain-of-thought, and generated knowledge.<n>Discusses the aspect of AI security, particularly adversarial attacks that exploit vulnerabilities in prompt engineering.
arXiv Detail & Related papers (2023-10-23T09:15:18Z)
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years. This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z)
Pre-Trained Models: Past, Present and Future [126.21572378910746]
Large-scale pre-trained models (PTMs) have recently achieved great success and become a milestone in the field of artificial intelligence (AI) By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch.
arXiv Detail & Related papers (2021-06-14T02:40:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.