Fugu-MT 論文翻訳(概要): ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly

論文の概要: ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly

arxiv url: http://arxiv.org/abs/2509.02949v1
Date: Wed, 03 Sep 2025 02:26:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 21:40:46.391152
Title: ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly
Title（参考訳）: ProMQAアセンブリ: アセンブリ上でのマルチモーダルな手続き型QAデータセット
Authors: Kimihiro Hasegawa, Wiradee Imrattanatrai, Masaki Asada, Susan Holm, Yuran Wang, Vincent Zhou, Ken Fukuda, Teruko Mitamura,
Abstract要約: 組立活動に関する新しいマルチモーダルQAデータセットを提案する。我々のデータセットであるProMQA-Assemblyは、391のQAペアで構成されており、人間の活動記録とその指導マニュアルのマルチモーダル理解を必要とする。
参考スコア（独自算出の注目度）: 13.040491675077687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Assistants on assembly tasks have a large potential to benefit humans from everyday tasks to industrial settings. However, no testbeds support application-oriented system evaluation in a practical setting, especially in assembly. To foster the development, we propose a new multimodal QA dataset on assembly activities. Our dataset, ProMQA-Assembly, consists of 391 QA pairs that require the multimodal understanding of human-activity recordings and their instruction manuals in an online-style manner. In the development, we adopt a semi-automated QA annotation approach, where LLMs generate candidates and humans verify them, as a cost-effective method, and further improve it by integrating fine-grained action labels to diversify question types. Furthermore, we create instruction task graphs for the target tasks of assembling toy vehicles. These newly created task graphs are used in our benchmarking experiment, as well as to facilitate the human verification process in the QA annotation. Utilizing our dataset, we benchmark models, including competitive proprietary multimodal models. Our results suggest great room for improvement for the current models. We believe our new evaluation dataset can contribute to the further development of procedural-activity assistants.
Abstract（参考訳）: 組立タスクのアシスタントは、人間の日常的なタスクから工業的環境まで、大きな利益をもたらす可能性がある。しかし,特にアセンブリにおいて,アプリケーション指向システム評価をサポートするテストベッドは存在しない。開発を促進するために,我々は組立活動に関する新しいマルチモーダルQAデータセットを提案する。我々のデータセットであるProMQA-Assemblyは、391組のQAペアで構成されており、オンラインスタイルで人間の活動記録とその指導マニュアルをマルチモーダルに理解する必要がある。開発において,LSMが候補を生成し,人間による検証を行う半自動QAアノテーション手法をコスト効率のよい手法として採用し,さらに細粒度なアクションラベルを統合して質問タイプを多様化して改善する。さらに,玩具車両を組み立てる目的タスクの指示タスクグラフを作成する。これらのタスクグラフは、ベンチマーク実験で使われ、QAアノテーションにおける人間の検証プロセスを容易にする。データセットを利用して、競合するプロプライエタリなマルチモーダルモデルを含むモデルをベンチマークします。我々の結果は、現在のモデルの改善の余地を示唆している。我々の新しい評価データセットは、手続き型アシスタントのさらなる発展に寄与できると信じている。

論文の概要: ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly

関連論文リスト