Fugu-MT 論文翻訳(概要): DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

論文の概要: DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

arxiv url: http://arxiv.org/abs/2602.16742v1
Date: Wed, 18 Feb 2026 01:51:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-20 15:21:28.262592
Title: DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
Title（参考訳）: DeepVision-103K:マルチモーダル推論のための視覚的ディバース、広帯域化、検証可能な数学的データセット
Authors: Haoxiang Sun, Lizhen Xu, Bing Zhao, Wotao Yin, Wei Wang, Boyu Yang, Rui Wang, Hu Wei,
Abstract要約: RLVR(Reinforcement Learning with Verifiable Rewards)は、LMM(Large Multimodal Models)の視覚反射と推論能力の向上に有効であることが示されている。 textbfDeepVision-103Kは、多様なK12数学的トピック、広範な知識ポイント、リッチビジュアル要素をカバーするRLVRトレーニングのための包括的データセットである。 DeepVisionで訓練されたモデルは、マルチモーダルな数学的ベンチマークで強力な性能を達成し、一般的なマルチモーダルな推論タスクに効果的に一般化する。
参考スコア（独自算出の注目度）: 21.055712962530716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or recombination of prior resources, which limits data diversity and coverage, thereby constraining further gains in model performance. To this end, we introduce \textbf{DeepVision-103K}, a comprehensive dataset for RLVR training that covers diverse K12 mathematical topics, extensive knowledge points, and rich visual elements. Models trained on DeepVision achieve strong performance on multimodal mathematical benchmarks, and generalize effectively to general multimodal reasoning tasks. Further analysis reveals enhanced visual perception, reflection and reasoning capabilities in trained models, validating DeepVision's effectiveness for advancing multimodal reasoning. Data: \href{https://huggingface.co/datasets/skylenage/DeepVision-103K}{this url}.
Abstract（参考訳）: RLVR(Reinforcement Learning with Verifiable Rewards)は,LMM(Large Multimodal Models)の視覚反射と推論能力の向上に有効であることが示されている。しかし、既存のデータセットは主に、データの多様性とカバレッジを制限し、モデル性能のさらなる向上を制限している、小規模のマニュアル構築または以前のリソースの再結合から派生している。この目的のために,多様なK12数学的トピック,広範な知識ポイント,豊富な視覚要素を網羅するRLVRトレーニング用包括的データセットである \textbf{DeepVision-103K} を紹介する。 DeepVisionで訓練されたモデルは、マルチモーダルな数学的ベンチマークで強力な性能を達成し、一般的なマルチモーダルな推論タスクに効果的に一般化する。さらなる分析により、訓練されたモデルにおける視覚知覚、反射、推論能力が向上し、マルチモーダル推論の進歩に対するDeepVisionの有効性が検証された。データ: \href{https://huggingface.co/datasets/skylenage/DeepVision-103K}{this url}。

論文の概要: DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

関連論文リスト