Fugu-MT 論文翻訳(概要): Can MLLMs "Read" What is Missing?

論文の概要: Can MLLMs "Read" What is Missing?

arxiv url: http://arxiv.org/abs/2604.21277v2
Date: Sun, 26 Apr 2026 08:57:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 13:03:00.564091
Title: Can MLLMs "Read" What is Missing?
Title（参考訳）: MLLMs "Read" What is Missing?
Authors: Jindi Guo, Chaozheng Huang, Xi Fang,
Abstract要約: マルチモーダル大言語モデル(MLLM)の本質的な能力を評価するためのベンチマークであるMMTR-Benchを導入する。従来の質問応答タスクとは異なり、MMTR-Benchは明示的なプロンプトを排除している。 MMTR-Benchは、複数の言語と様々なターゲット長さにまたがる2,771の試験サンプルからなる。
参考スコア（独自算出の注目度）: 2.7300368031373505
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce MMTR-Bench, a benchmark designed to evaluate the intrinsic ability of Multimodal Large Language Models (MLLMs) to reconstruct masked text directly from visual context. Unlike conventional question-answering tasks, MMTR-Bench eliminates explicit prompts, requiring models to recover masked text from single- or multi-page inputs across real-world domains such as documents and webpages. This design isolates the reconstruction task from instruction-following abilities, enabling a direct assessment of a model's layout understanding, visual grounding, and knowledge integration. MMTR-Bench comprises 2,771 test samples spanning multiple languages and varying target lengths. To account for this diversity, we propose a level-aware evaluation protocol. Experiments on representative MLLMs show that the benchmark poses a significant challenge, especially for sentence- and paragraph-level reconstruction. The homepage is available at https://mmtr-bench-dataset.github.io/MMTR-Bench/.
Abstract（参考訳）: MMTR-Benchは,マルチモーダル大言語モデル(MLLM)による視覚的コンテキストから直接マスキングされたテキストを再構築する本質的な能力を評価するためのベンチマークである。従来の質問応答タスクとは異なり、MMTR-Benchは明示的なプロンプトを排除し、文書やWebページのような現実世界のドメインにまたがるシングルページまたはマルチページの入力からマスキングされたテキストを復元する必要がある。この設計は、再構成タスクを命令追従能力から分離し、モデルのレイアウト理解、視覚的接地、知識統合の直接的な評価を可能にする。 MMTR-Benchは、複数の言語と様々なターゲット長さにまたがる2,771の試験サンプルからなる。この多様性を考慮し,レベルアウェア評価プロトコルを提案する。代表的MLLMの実験は、特に文や段落レベルの再構築において、ベンチマークが重要な課題となることを示している。ホームページはhttps://mmtr-bench-dataset.github.io/MMTR-Bench/で公開されている。

論文の概要: Can MLLMs "Read" What is Missing?

関連論文リスト