Fugu-MT 論文翻訳(概要): MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents

論文の概要: MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents

arxiv url: http://arxiv.org/abs/2512.00756v1
Date: Sun, 30 Nov 2025 06:47:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-02 19:46:34.401512
Title: MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents
Title（参考訳）: MPR-GUI:GUIエージェントにおける多言語認識と推論のベンチマークと強化
Authors: Ruihan Chen, Qiming Li, Xiaocheng Feng, Xiaoliang Yang, Weihong Zhong, Yuxuan Gu, Zekun Zhou, Bing Qin,
Abstract要約: LVLM(Large Vision-Language Models)は、GUI(Graphical User Interface)タスク上での知覚と推論(P&R)のパフォーマンスを示す。しかし、多言語環境でのパフォーマンスはほとんど注目されず、グローバルなアプリケーションに制限が加えられている。本稿では,GUIエージェントのP&R能力を評価するために,細粒度パーセプションと推論GUIベンチマークであるMPR-GUI-Benchを提案する。
参考スコア（独自算出の注目度）: 42.81572211701814
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the advancement of computational resources, Large Vision-Language Models (LVLMs) exhibit impressive Perception and Reasoning (P&R) performance on Graphical User Interface (GUI) tasks. However, although they demonstrate strong P&R capabilities in English GUI scenarios, their performance in multilingual settings has received little attention, which limits their global applications. Moreover, existing studies on GUI tasks lack fine-grained analyses, including widget functions and elements' spatial relationships, which are fundamental for more targeted improvements. To tackle these issues, we propose MPR-GUI-Bench, a Multilingual fine-grained Perception and Reasoning GUI Benchmark to evaluate GUI agents' P&R capabilities. Evaluation results demonstrate that LVLMs exhibit significantly worse P&R performance in non-English languages than in English. To address these gaps, we propose GUI-XLI, a GUI Cross-Lingual Intervention method that applies interventions to the hidden states at P&R capability-related layers to mitigate the gaps between English and other languages, building on previous research showing that the hidden states of different language inputs exhibit significant differences in the latent space. Experimental results indicate that our method improves GUI agents' multilingual P&R capability by 6.5% on average.
Abstract（参考訳）: 計算資源の進歩に伴い、LVLM(Large Vision-Language Models)はグラフィカルユーザインタフェース(GUI)タスク上でのパーセプションと推論(P&R)のパフォーマンスを示す。しかし、英語のGUIシナリオでは強力なP&R機能を示すが、多言語環境でのパフォーマンスはほとんど注目されず、グローバルなアプリケーションに制限が加えられている。さらに、GUIタスクに関する既存の研究は、ウィジェット機能や要素の空間関係など、より標的となる改善の基礎となる詳細な分析を欠いている。これらの問題に対処するために,GUIエージェントのP&R能力を評価するための多言語パーセプションと推論GUIベンチマークであるMPR-GUI-Benchを提案する。評価の結果,LVLMは英語よりも英語以外の言語では有意にP&R性能が劣っていることがわかった。これらのギャップを解決するために,GUI-XLIを提案する。GUI-XLIは,言語入力の隠れ状態が潜時空間に有意な差異を示すことを示す従来の研究に基づいて,P&R能力関連層における隠れ状態への介入を緩和するGUI-Lingual Intervention法である。実験の結果,GUIエージェントの多言語P&R能力は平均6.5%向上した。

論文の概要: MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents

関連論文リスト