Fugu-MT 論文翻訳(概要): GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

論文の概要: GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

arxiv url: http://arxiv.org/abs/2507.01006v3
Date: Wed, 13 Aug 2025 15:10:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-14 16:17:42.668258
Title: GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Title（参考訳）: GLM-4.1VシンキングとGLM-4.5V:スケーラブル強化学習によるマルチモーダル推論に向けて
Authors: GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiale Zhu, Jiali Chen, Jing Chen, Jinhao Chen, Jinghao Lin, Jinjiang Wang, Junjie Chen, Leqi Lei, Letian Gong, Leyi Pan, Mingdao Liu, Mingzhi Zhang, Qinkai Zheng, Sheng Yang, Shi Zhong, Shiyu Huang, Shuyuan Zhao, Siyan Xue, Shangqin Tu, Shengbiao Meng, Tianshu Zhang, Tianwei Luo, Tianxiang Hao, Wenkai Li, Wei Jia, Xiao Liu, Xiaohan Zhang, Xin Lyu, Xuancheng Huang, Yanling Wang, Yadong Xue, Yanfeng Wang, Yanzi Wang, Yifan An, Yifan Du, Yiming Shi, Yiheng Huang, Yilin Niu, Yuan Wang, Yuanchang Yue, Yuchen Li, Yutao Zhang, Yuting Wang, Yu Wang, Yuxuan Zhang, Zhanxiao Du, Zhenyu Hou, Zhao Xue, Zhengxiao Du, Zihan Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang,
Abstract要約: 視覚言語モデル(VLM)のファミリーであるGLM-4.1VシンキングとGLM-4.5Vを提案する。 GLM-4.5Vは、ほぼ全てのタスクにおいて、同じ大きさのオープンソースモデル間で最先端のパフォーマンスを達成する。より小型のGLM-4.1V-9Bシンキングは29のベンチマークでより大型のQwen2.5-VL-72Bよりも競争力に優れていた。
参考スコア（独自算出の注目度）: 117.3814584338105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present GLM-4.1V-Thinking and GLM-4.5V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. Code, models and more information are released at https://github.com/zai-org/GLM-V.
Abstract（参考訳）: GLM-4.1V-Thinking と GLM-4.5V は、汎用マルチモーダル理解と推論を促進するために設計されたビジョン言語モデル(VLM)のファミリーである。本報告では、推論中心のトレーニングフレームワークの開発における重要な成果について紹介する。まず,大規模事前学習によって有意なポテンシャルを持つ有能な視覚基盤モデルを構築し,最終性能の上限を確実に設定する。次に,カリキュラムサンプリングによる強化学習(Reinforcement Learning with Curriculum Smpling, RLCS)を提案し,STEM問題解決,ビデオ理解,コンテンツ認識,コーディング,グラウンドニング,GUIベースのエージェント,長期文書解釈など,さまざまなタスクに包括的能力向上をもたらす。 42の公開ベンチマークにわたる総合的な評価では、GLM-4.5Vは、ほぼすべてのタスクにおいて、同様のサイズのオープンソースモデルで最先端のパフォーマンスを達成し、コーディングやGUIエージェントといった課題に対するGemini-2.5-Flashのようなクローズドソースモデルと比較して、競争力や優位性を示す。一方、より小型のGLM-4.1V-9Bシンキングは29のベンチマークでより大型のQwen2.5-VL-72Bよりも競争力に優れていた。 GLM-4.1V-9B-ThinkingとGLM-4.5Vをオープンソースとして公開する。コード、モデル、その他の情報はhttps://github.com/zai-org/GLM-V.comで公開されている。

論文の概要: GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

関連論文リスト