Fugu-MT 論文翻訳(概要): ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

論文の概要: ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

arxiv url: http://arxiv.org/abs/2407.12358v1
Date: Wed, 17 Jul 2024 07:29:59 GMT
ステータス: 翻訳完了
システム内更新日: 2024-07-18 18:07:45.430830
Title: ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Title（参考訳）: ProcTag: ドキュメントインストラクションデータの有効性を評価するプロセスタギング
Authors: Yufan Shen, Chuwei Luo, Zhaoqing Zhu, Yang Chen, Qi Zheng, Zhi Yu, Jiajun Bu, Cong Yao,
Abstract要約: ProcTagは、文書命令データの有効性を評価するデータ指向の手法である。実験により、既存のオープンソースおよび生成された文書VQA/インストラクションデータセットをProcTagでサンプリングすることは、インストラクションデータを評価する現在の方法よりも大幅に優れていることが示された。
参考スコア（独自算出の注目度）: 28.553840579302484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of LLMs and MLLMs for document VQA. However, most existing evaluation methods for instruction data are limited to the textual content of the instructions themselves, thereby hindering the effective assessment of document instruction datasets and constraining their construction. In this paper, we propose ProcTag, a data-oriented method that assesses the efficacy of document instruction data. ProcTag innovatively performs tagging on the execution process of instructions rather than the instruction text itself. By leveraging the diversity and complexity of these tags to assess the efficacy of the given dataset, ProcTag enables selective sampling or filtering of document instructions. Furthermore, DocLayPrompt, a novel semi-structured layout-aware document prompting strategy, is proposed for effectively representing documents. Experiments demonstrate that sampling existing open-sourced and generated document VQA/instruction datasets with ProcTag significantly outperforms current methods for evaluating instruction data. Impressively, with ProcTag-based sampling in the generated document datasets, only 30.5\% of the document instructions are required to achieve 100\% efficacy compared to the complete dataset. The code is publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/ProcTag .
Abstract（参考訳）: 近年,大規模言語モデル (LLMs) とマルチモーダル大規模言語モデル (MLLMs) が文書視覚質問応答 (VQA) タスクにおいて有望な結果を示した。文書命令データの効果的な評価法は、文書VQAのための LLM と MLLM の訓練を容易にする高効率な命令データの構築に不可欠である。しかし、既存の命令データの評価手法は、命令自体のテキストの内容に限られており、文書の命令データセットの効果的な評価や構成の制約を妨げている。本稿では,文書命令データの有効性を評価するデータ指向手法であるProcTagを提案する。 ProcTagは、命令テキスト自体ではなく、命令の実行プロセスにタグ付けを革新的に行う。これらのタグの多様性と複雑さを活用して、与えられたデータセットの有効性を評価することにより、ProcTagはドキュメント命令の選択的なサンプリングやフィルタリングを可能にする。さらに,文書を効果的に表現するための半構造化レイアウト対応文書作成戦略であるDocLayPromptを提案する。実験により、既存のオープンソースおよび生成された文書VQA/インストラクションデータセットをProcTagでサンプリングすることは、インストラクションデータを評価する現在の方法よりも大幅に優れていることが示された。興味深いことに、生成した文書データセットのProcTagベースのサンプリングでは、完全なデータセットに比べて100倍の有効性を達成するために、文書命令の30.55%しか必要とされない。コードはhttps://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/ProcTag で公開されている。

論文の概要: ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

関連論文リスト