Fugu-MT 論文翻訳(概要): R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

論文の概要: R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

arxiv url: http://arxiv.org/abs/2604.20696v1
Date: Wed, 22 Apr 2026 15:41:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:11.203539
Title: R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs
Title（参考訳）: R-CoV:LVLMにおける物体幻覚の緩和のための領域認識チェイン・オブ・検証
Authors: Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele,
Abstract要約: Region-Aware Chain-of-Verification (R-CoV) は、大規模視覚言語モデルにおける物体の幻覚を緩和する視覚連鎖検証法である。 R-CoVは、初期応答生成、エンティティ抽出、座標生成、領域記述、検証実行、最終応答生成の6つのステップから構成される。
参考スコア（独自算出の注目度）: 88.62912181680413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Region-aware Chain-of-Verification (R-CoV), a visual chain-of-verification method to alleviate object hallucinations in LVLMs in a post-hoc manner. Motivated by how humans comprehend intricate visual information -- often focusing on specific image regions or details within a given sample -- we elicit such region-level processing from LVLMs themselves and use it as a chaining cue to detect and alleviate their own object hallucinations. Specifically, our R-CoV consists of six steps: initial response generation, entity extraction, coordinate generation, region description, verification execution, and final response generation. As a simple yet effective method, R-CoV can be seamlessly integrated into various LVLMs in a training-free manner and without relying on external detection models. Extensive experiments on several widely used hallucination benchmarks across multiple LVLMs demonstrate that R-CoV can significantly alleviate object hallucinations in LVLMs. Project page: https://github.com/Jiahao000/R-CoV.
Abstract（参考訳）: 大規模視覚言語モデル(LVLM)は、様々なマルチモーダル理解および推論タスクにおいて印象的な性能を示した。しかし、それらはまだ物体の幻覚、すなわち視覚入力における存在しない物体の主張に苦戦している。この課題に対処するため,LVLMの視覚的連鎖検証法であるRerea-Aware Chain-of-Verification (R-CoV)を提案する。人間が複雑な視覚情報をどう理解するか – 多くの場合、特定の画像領域やサンプル内の詳細に注目する — によって、私たちはLVLM自体からそのような領域レベルの処理を抽出し、それを連鎖キューとして使用して、自身のオブジェクト幻覚を検出し、緩和する。具体的には、初期応答生成、エンティティ抽出、座標生成、領域記述、検証実行、最終応答生成の6つのステップからなる。単純で効果的な方法として、R-CoVは外部検出モデルに頼ることなく、トレーニングのない方法で様々なLVLMにシームレスに統合することができる。複数のLVLMで広く使用されている幻覚ベンチマークの広範な実験により、R-CoVはLVLMの物体幻覚を著しく緩和できることが示された。プロジェクトページ:https://github.com/Jiahao000/R-CoV.com

論文の概要: R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

関連論文リスト