Fugu-MT 論文翻訳(概要): PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

論文の概要: PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

arxiv url: http://arxiv.org/abs/2602.06663v1
Date: Fri, 06 Feb 2026 12:47:16 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-09 22:18:26.39752
Title: PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks
Title（参考訳）: PlanViz: コンピュータ利用タスクのための計画指向画像生成と編集の評価
Authors: Junxian Li, Kai Liu, Leyang Chen, Weida Wang, Zhixin Wang, Jiaqi Xu, Fan Li, Renjing Pei, Linghe Kong, Yulun Zhang,
Abstract要約: コンピュータ用タスクの画像生成と編集を行うための新しいベンチマークであるPlanVizを提案する。ルート計画、ワークダイアグラム、Web&UI表示の3つの新しいサブタスクが設計されている。総合的かつ正確な評価の課題に対して,タスク適応型スコアであるPlanScoreを提案する。
参考スコア（独自算出の注目度）: 52.5195594960371
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unified multimodal models (UMMs) have shown impressive capabilities in generating natural images and supporting multimodal reasoning. However, their potential in supporting computer-use planning tasks, which are closely related to our lives, remain underexplored. Image generation and editing in computer-use tasks require capabilities like spatial reasoning and procedural understanding, and it is still unknown whether UMMs have these capabilities to finish these tasks or not. Therefore, we propose PlanViz, a new benchmark designed to evaluate image generation and editing for computer-use tasks. To achieve the goal of our evaluation, we focus on sub-tasks which frequently involve in daily life and require planning steps. Specifically, three new sub-tasks are designed: route planning, work diagramming, and web&UI displaying. We address challenges in data quality ensuring by curating human-annotated questions and reference images, and a quality control process. For challenges of comprehensive and exact evaluation, a task-adaptive score, PlanScore, is proposed. The score helps understanding the correctness, visual quality and efficiency of generated images. Through experiments, we highlight key limitations and opportunities for future research on this topic.
Abstract（参考訳）: 統一マルチモーダルモデル(UMM)は、自然画像の生成とマルチモーダル推論のサポートに優れた能力を示している。しかし、我々の生活と密接に関連しているコンピュータ利用計画タスクを支援する可能性については、いまだ未解明のままである。コンピュータ利用タスクにおける画像生成と編集には、空間的推論や手続き的理解などの機能が必要であり、UMMがこれらのタスクを終了する能力を持っているかどうかはまだ不明である。そこで本稿では,コンピュータ利用タスクの画像生成と編集を行うためのベンチマークであるPlanVizを提案する。評価の目的を達成するために,日々の生活に頻繁に関与し,計画段階を必要とするサブタスクに焦点をあてる。具体的には、ルート計画、ワークダイアグラム、Web&UI表示という、3つの新しいサブタスクが設計されている。我々は、人間の注釈付き質問や参照画像のキュレーションによるデータ品質確保の課題と品質管理プロセスに対処する。総合的かつ正確な評価の課題に対して,タスク適応型スコアであるPlanScoreを提案する。このスコアは、生成された画像の正確さ、視覚的品質、効率を理解するのに役立つ。実験を通じて、このトピックに関する今後の研究の鍵となる限界と機会を強調します。

論文の概要: PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

関連論文リスト