Fugu-MT 論文翻訳(概要): VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

論文の概要: VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

arxiv url: http://arxiv.org/abs/2412.02172v1
Date: Tue, 03 Dec 2024 05:04:49 GMT
ステータス: 翻訳完了
システム内更新日: 2024-12-04 21:11:22.88593
Title: VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Title（参考訳）: VISCO: ビジュアル推論における自己改善に向けた微粒な批判と訂正のベンチマーク
Authors: Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang, Nanyun Peng,
Abstract要約: 我々は,LVLMの細粒度評価と補正能力を広範囲に解析する最初のベンチマークであるVISCOを提案する。 VISCOは密度が高くきめ細かな批判を特徴とし、LVLMは各ステップの正しさを評価する必要がある。 LookBackは、批評と修正のパフォーマンスを最大13.5%改善する。
参考スコア（独自算出の注目度）: 112.35483894933904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability of large vision-language models (LVLMs) to critique and correct their reasoning is an essential building block towards their self-improvement. However, a systematic analysis of such capabilities in LVLMs is still lacking. We propose VISCO, the first benchmark to extensively analyze the fine-grained critique and correction capabilities of LVLMs. Compared to existing work that uses a single scalar value to critique the entire reasoning [4], VISCO features dense and fine-grained critique, requiring LVLMs to evaluate the correctness of each step in the chain-of-thought and provide natural language explanations to support their judgments. Extensive evaluation of 24 LVLMs demonstrates that human-written critiques significantly enhance the performance after correction, showcasing the potential of the self-improvement strategy. However, the model-generated critiques are less helpful and sometimes detrimental to the performance, suggesting that critique is the crucial bottleneck. We identified three common patterns in critique failures: failure to critique visual perception, reluctance to "say no", and exaggerated assumption of error propagation. To address these issues, we propose an effective LookBack strategy that revisits the image to verify each piece of information in the initial reasoning. LookBack significantly improves critique and correction performance by up to 13.5%.
Abstract（参考訳）: 大規模視覚言語モデル(LVLM)がそれらの推論を批判し、修正する能力は、自己改善に向けた重要な構成要素である。しかし、LVLMにおけるそのような能力の体系的分析はいまだに不足している。我々は,LVLMの細粒度評価と補正能力を広範囲に解析する最初のベンチマークであるVISCOを提案する。一つのスカラー値を用いて推論全体を批判する既存の研究と比較すると、VISCOは密できめ細かな批判を特徴とし、LVLMは各ステップの正しさを評価し、彼らの判断を支持する自然言語の説明を提供する必要がある。 24LVLMの大規模評価は, 自己改善戦略の可能性を示すとともに, 人文批判が修正後の性能を著しく向上させることを示した。しかし、モデル生成の批判は役に立たず、時にはパフォーマンスに有害であり、批判が重要なボトルネックであることを示唆している。批判的失敗に共通する3つのパターンは,視覚的認識の失敗,「ノー」の否定,過大評価された誤り伝播の仮定である。これらの問題に対処するため,画像を再検討し,初期推論における各情報の検証を行うLookBack戦略を提案する。 LookBackは、批評と修正のパフォーマンスを最大13.5%改善する。

論文の概要: VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

関連論文リスト