Fugu-MT 論文翻訳(概要): DeepSeek-V3 Technical Report

論文の概要: DeepSeek-V3 Technical Report

arxiv url: http://arxiv.org/abs/2412.19437v2
Date: Tue, 18 Feb 2025 17:26:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-19 17:59:03.293246
Title: DeepSeek-V3 Technical Report
Title（参考訳）: DeepSeek-V3テクニカルレポート
Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jiawei Wang, Jin Chen, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, Junxiao Song, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Litong Wang, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qiancheng Wang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, Runxin Xu, Ruoyu Zhang, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Shuting Pan, T. Wang, Tao Yun, Tian Pei, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wanjia Zhao, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaokang Zhang, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xingkai Yu, Xinnan Song, Xinxia Shan, Xinyi Zhou, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, Y. K. Li, Y. Q. Wang, Y. X. Wei, Y. X. Zhu, Yang Zhang, Yanhong Xu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Yu, Yi Zheng, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Ying Tang, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yu Wu, Yuan Ou, Yuchen Zhu, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yukun Zha, Yunfan Xiong, Yunxian Ma, Yuting Yan, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Z. F. Wu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhibin Gou, Zhicheng Ma, Zhigang Yan, Zhihong Shao, Zhipeng Xu, Zhiyu Wu, Zhongyu Zhang, Zhuoshu Li, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Ziyi Gao, Zizheng Pan,
Abstract要約: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token。我々は14.8兆の多様性と高品質のトークンでDeepSeek-V3を事前訓練し、その後にSupervised Fine-Tuning and Reinforcement Learningのステージを受講した。包括的な評価によると、DeepSeek-V3は他のオープンソースモデルよりも優れており、主要なクローズドソースモデルに匹敵するパフォーマンスを実現している。
参考スコア（独自算出の注目度）: 147.16121855209246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
Abstract（参考訳）: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token。効率的な推論とコスト効率のトレーニングを実現するため、DeepSeek-V3ではMulti-head Latent Attention (MLA)とDeepSeekMoEアーキテクチャを採用しており、DeepSeek-V2で完全に検証されている。さらに、DeepSeek-V3はロードバランシングのための補助的なロスフリー戦略を開拓し、より強力なパフォーマンスのためのマルチトークン予測トレーニング目標を設定する。我々は14.8兆の多様性と高品質のトークンでDeepSeek-V3を事前訓練し、続いてSupervised Fine-Tuning and Reinforcement Learningステージでその能力を完全に活用する。包括的な評価によると、DeepSeek-V3は他のオープンソースモデルよりも優れており、主要なクローズドソースモデルに匹敵するパフォーマンスを実現している。優れた性能にもかかわらず、DeepSeek-V3はフルトレーニングに2.788万 H800 GPU時間しか必要としない。さらに、トレーニングプロセスは極めて安定している。トレーニングプロセス全体を通じて、発見不可能な損失スパイクを経験したり、ロールバックを実行したりはしていません。モデルチェックポイントはhttps://github.com/deepseek-ai/deepSeek-V3.comで公開されている。

関連論文リスト

Skywork Open Reasoner 1 Technical Report [51.403686909760914]
提案するSkywork-OR1は,長期チェーン・オブ・ソート(CoT)モデルのための,効果的かつスケーラブルな強化学習(RL)実装である。 DeepSeek-R1-Distillモデルシリーズをベースとして、我々のRLアプローチは顕著なパフォーマンス向上を実現している。我々のSkywork-OR1-32Bモデルは、AIME24とAIME25ベンチマークでDeepSeek-R1とQwen3-32Bを上回っています。
論文参考訳（メタデータ） (2025-05-28T12:56:04Z)
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models [19.984973014373118]
FLAME-MoEは7つのデコーダのみのモデルからなる完全にオープンソースな研究スイートである。 FLAME-MoEは、同一のFLOPで訓練された密度の高いベースラインよりも平均精度を最大3.4ポイント向上させる。
論文参考訳（メタデータ） (2025-05-26T17:06:25Z)
One RL to See Them All: Visual Triple Unified Reinforcement Learning [92.90120580989839]
V-Triuneは、視覚的推論と知覚タスクを1つのトレーニングパイプライン内で実現するビジュアルトリプル統一強化学習システムである。 V-Triuneは3つの補完的なコンポーネントで構成されている: Sample-Level Datashelf (多様なタスク入力を統一する)、Verifier-Level Reward (特殊検証を通じてカスタム報酬を提供する)。本稿では,V-Triuneが処理する知覚タスクに対して適応的,進行的,明確なフィードバックを提供する,新しい動的IoU報酬を提案する。
論文参考訳（メタデータ） (2025-05-23T17:41:14Z)
Memory Analysis on the Training Course of DeepSeek Models [5.482535254884105]
本稿では,DeepSeek-v2やDeepSeek-v3といったDeepSeekモデルのトレーニング中のGPUメモリ消費に関する理論的解析を行う。本報告で論じるトレーニング方針がDeepSeekの公式設定を代表していない点を強調しておくことが重要である。
論文参考訳（メタデータ） (2025-02-11T09:51:25Z)
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding [39.14141055325595]
We present DeepSeek-VL2, a Advanced series of large Mixture-of-Experts (MoE) Vision-Language Models。ビジョンコンポーネントには、アスペクト比の異なる高解像度画像を処理するために設計された動的タイリングビジョン符号化戦略が組み込まれている。言語コンポーネントについては、Multi-head Latent AttentionメカニズムでDeepSeekMoEモデルを活用します。
論文参考訳（メタデータ） (2024-12-13T17:37:48Z)
DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
ガウススプラッティングと深さ推定を結合するDepthSplatを提案する。まず,事前学習した単眼深度特徴を生かして,頑健な多眼深度モデルを提案する。また,ガウス的スプラッティングは教師なし事前学習の目的として機能することを示す。
論文参考訳（メタデータ） (2024-10-17T17:59:58Z)
Depth Anything V2 [84.88796880335283]
V2は3つの重要なプラクティスを通じて、より微細でより堅牢な深度予測を生成する。すべてのラベル付き実像を合成画像に置き換え、教師モデルの容量を拡大し、大規模な擬似ラベル付き実像のブリッジを通じて生徒モデルを教える。その強い一般化能力から、距離深度モデルを得るために、距離深度ラベルを微調整する。
論文参考訳（メタデータ） (2024-06-13T17:59:56Z)
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model [118.06260386652778]
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference。 DeepSeek-V2は、MLA(Multi-head Latent Attention)やDeepSeekMoEといった革新的なアーキテクチャを採用している。 DeepSeek-V2はDeepSeek 67Bと比較して大幅に性能が向上し、トレーニングコストは42.5%削減された。
論文参考訳（メタデータ） (2024-05-07T15:56:43Z)
Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation [16.957139277317005]
Af-DCD(Augmentation-free Dense Contrastive Knowledge Distillation)は、新しいコントラスト蒸留学習パラダイムである。 Af-DCDはセマンティックセグメンテーションのためのコンパクトで正確なディープニューラルネットワークを訓練する。
論文参考訳（メタデータ） (2023-12-07T09:37:28Z)
Towards Better Data Exploitation in Self-Supervised Monocular Depth Estimation [14.262669370264994]
本稿では、データセットのトレーニングの可能性を完全に活用するために、Resizing-CroppingとSplitting-Permutingという2つのデータ拡張手法を用いる。具体的には、原画像と生成した2つの拡張イメージを同時にトレーニングパイプラインに供給し、自己蒸留を行う。実験により,KITTIベンチマークを用いて,地中真理と地中真理の両面から,最先端の性能を実証した。
論文参考訳（メタデータ） (2023-09-11T06:18:05Z)
For SALE: State-Action Representation Learning for Deep Reinforcement Learning [60.42044715596703]
SALEは、状態と行動の間のニュアンスな相互作用をモデル化する埋め込みを学ぶための新しいアプローチである。我々は、SALEとRLのチェックポイントをTD3に統合し、TD7アルゴリズムを構成する。 OpenAIのジムのベンチマークタスクでは、TD7は平均276.7%、TD3よりも50.7%、それぞれ300k、500Mのタイムステップでパフォーマンスが向上している。
論文参考訳（メタデータ） (2023-06-04T19:47:46Z)
Geometry Uncertainty Projection Network for Monocular 3D Object Detection [138.24798140338095]
本稿では,予測および学習段階の誤り増幅問題に対処するために,幾何不確実性予測ネットワーク(GUP Net)を提案する。具体的には, GUPモジュールを提案し, 推定深さの幾何誘導不確かさを求める。トレーニング段階では,エラー増幅による不安定性を低減するための階層型タスク学習戦略を提案する。
論文参考訳（メタデータ） (2021-07-29T06:59:07Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。