Fugu-MT 論文翻訳(概要): Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models

論文の概要: Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models

arxiv url: http://arxiv.org/abs/2509.16805v1
Date: Sat, 20 Sep 2025 20:45:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:15.982071
Title: Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models
Title（参考訳）: 大規模視覚言語モデルのMCQA選択バイアスのベンチマークと緩和
Authors: Md. Atabuzzaman, Ali Asgarov, Chris Thomas,
Abstract要約: 大規模視覚言語モデル(LVLM)における選択バイアスの存在と性質について検討する。一般および文脈的プロンプトからアンサンブルバイアスベクトルを推定する推論時間ロジットレベルのデバイアス法を提案する。本手法はリトレーニングなしでバイアスを軽減し,冷凍LVLMと互換性がある。
参考スコア（独自算出の注目度）: 2.393011821499345
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Vision-Language Models (LVLMs) have achieved strong performance on vision-language tasks, particularly Visual Question Answering (VQA). While prior work has explored unimodal biases in VQA, the problem of selection bias in Multiple-Choice Question Answering (MCQA), where models may favor specific option tokens (e.g., "A") or positions, remains underexplored. In this paper, we investigate both the presence and nature of selection bias in LVLMs through fine-grained MCQA benchmarks spanning easy, medium, and hard difficulty levels, defined by the semantic similarity of the options. We further propose an inference-time logit-level debiasing method that estimates an ensemble bias vector from general and contextual prompts and applies confidence-adaptive corrections to the model's output. Our method mitigates bias without retraining and is compatible with frozen LVLMs. Extensive experiments across several state-of-the-art models reveal consistent selection biases that intensify with task difficulty, and show that our mitigation approach significantly reduces bias while improving accuracy in challenging settings. This work offers new insights into the limitations of LVLMs in MCQA and presents a practical approach to improve their robustness in fine-grained visual reasoning. Datasets and code are available at: https://github.com/Atabuzzaman/Selection-Bias-of-LVLMs
Abstract（参考訳）: LVLM(Large Vision-Language Models)は、視覚言語タスク、特にVQA(Visual Question Answering)において、強力なパフォーマンスを実現している。従来、VQAにおける一助バイアスを探索してきたが、Multiple-Choice Question Answering (MCQA)における選択バイアスの問題では、モデルが特定のオプショントークン(例えば「A」)や位置を優先する可能性がある。本稿では,LVLMにおける選択バイアスの存在と性質を,オプションの意味的類似性によって定義される,容易,中,難易度にまたがるMCQAベンチマークを用いて検討する。さらに、一般および文脈的プロンプトからアンサンブルバイアスベクトルを推定し、モデルの出力に信頼適応補正を適用する推論時間ロジットレベルのデバイアス法を提案する。本手法は,リトレーニングなしでバイアスを軽減し,冷凍LVLMと互換性がある。いくつかの最先端モデルに対する大規模な実験では、タスクの難易度を増大させる一貫した選択バイアスが示され、我々の緩和アプローチは、困難な設定における正確性を改善しながら、バイアスを著しく減少させることを示す。この研究は、MCQAにおけるLVLMの限界に対する新たな洞察を与え、きめ細かい視覚的推論におけるロバスト性を改善するための実践的なアプローチを提供する。 datasets and code are available at https://github.com/Atabuzzaman/Selection-Bias-of-LVLMs

論文の概要: Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models

関連論文リスト