Fugu-MT 論文翻訳(概要): From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding

論文の概要: From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding

arxiv url: http://arxiv.org/abs/2605.15951v1
Date: Fri, 15 May 2026 13:41:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 21:22:26.294914
Title: From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding
Title（参考訳）: 失敗からフィードバックへ: グループリビジョンは、オブジェクトレベルグラウンドにおけるハードケースをアンロックする
Authors: Yuyuan Liu, Yiping Ji, Anjie Le, Jiayuan Zhu, Jiazhen Pan, Can Peng, Jiajun Deng, Fengbei Liu, Junde Wu,
Abstract要約: 本稿では,ハードケースにおける学習を促進するグループ・リビジョン最適化パラダイムを提案する。報酬形成にインスパイアされ,初期試行よりも各候補者の改善を定量化する統合プロセスを導入する。提案手法は,従来のGRPOモデルと比較して,参照と推論のセグメンテーション,REC,およびカウントのベンチマークで一貫したゲインを実現する。
参考スコア（独自算出の注目度）: 35.008790179255136
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Finetuning Large Vision-Language Models with reinforcement learning has emerged as a promising approach to enhance their capability in object-level grounding. However, existing methods, mainly based on GRPO, assign rewards at the response level. Such sparse reward, often criterion-induced, leads to minimal learning signals when all candidate responses fail in challenging scenarios. In this work, we propose a group-revision optimisation paradigm that enhances learning on hard cases. It begins with a sampled initial response and generates a set of revised candidates to explore improved grounding outcomes. Inspired by reward shaping, we introduce a consolidation process that quantifies each candidate's improvement over the initial attempt and converts it into informative shaping signals. These signals are used to both refine the reward and modulate the advantage, amplifying the influence of high-quality revisions. Our method achieves consistent gains across referring and reasoning segmentation, REC, and counting benchmarks compared with prior GRPO-based models. Our code is available at https://github.com/yyliu01/GroupRevision.
Abstract（参考訳）: 物体レベルでの接地能力を高めるために,強化学習を用いた大規模視線探索モデルが有望なアプローチとして出現している。しかし、既存のメソッドは、主にGRPOに基づいて、レスポンスレベルで報酬を割り当てます。このようなスパース報酬は、しばしば基準によって引き起こされるが、全ての候補応答が挑戦的なシナリオで失敗すると、最小限の学習信号をもたらす。本研究では,ハードケースにおける学習を向上するグループリビジョン最適化パラダイムを提案する。最初はサンプルの最初の反応から始まり、改良された基底結果を調べるために修正された候補のセットを生成する。報酬形成にインスパイアされた統合プロセスを導入し、各候補が初期試行に対して改善したことを定量化し、情報を形作る信号に変換する。これらの信号は報酬の洗練と利点の調整の両方に使われ、高品質な修正の影響を増幅する。提案手法は,従来のGRPOモデルと比較して,参照と推論のセグメンテーション,REC,およびカウントのベンチマークで一貫したゲインを実現する。私たちのコードはhttps://github.com/yyliu01/GroupRevision.comで公開されています。

論文の概要: From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding

関連論文リスト