Fugu-MT 論文翻訳(概要): AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings

論文の概要: AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings

arxiv url: http://arxiv.org/abs/2508.13606v1
Date: Tue, 19 Aug 2025 08:12:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-20 15:36:31.841217
Title: AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings
Title（参考訳）: AdaDocVQA: 低リソース環境での長時間の視覚的質問応答のための適応フレームワーク
Authors: Haoxuan Li, Wei Song, Aofan Liu, Peiwu Qin,
Abstract要約: Document Visual Question Answering (Document VQA)は、低リソース環境で長いドキュメントを処理する場合、重大な課題に直面します。本稿では、3つのコアイノベーションを通じてこれらの課題に対処する統一適応フレームワークであるAdaDocVQAについて述べる。日本語文書VQAベンチマークの実験では,Yes/No質問に対して83.04%の精度で大幅な改善が示された。
参考スコア（独自算出の注目度）: 8.22650587342049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Document Visual Question Answering (Document VQA) faces significant challenges when processing long documents in low-resource environments due to context limitations and insufficient training data. This paper presents AdaDocVQA, a unified adaptive framework addressing these challenges through three core innovations: a hybrid text retrieval architecture for effective document segmentation, an intelligent data augmentation pipeline that automatically generates high-quality reasoning question-answer pairs with multi-level verification, and adaptive ensemble inference with dynamic configuration generation and early stopping mechanisms. Experiments on Japanese document VQA benchmarks demonstrate substantial improvements with 83.04\% accuracy on Yes/No questions, 52.66\% on factual questions, and 44.12\% on numerical questions in JDocQA, and 59\% accuracy on LAVA dataset. Ablation studies confirm meaningful contributions from each component, and our framework establishes new state-of-the-art results for Japanese document VQA while providing a scalable foundation for other low-resource languages and specialized domains. Our code available at: https://github.com/Haoxuanli-Thu/AdaDocVQA.
Abstract（参考訳）: Document Visual Question Answering (Document VQA)は、コンテキスト制限と不十分なトレーニングデータのために、低リソース環境で長いドキュメントを処理する場合、重大な課題に直面します。本稿では、これらの課題に対処する統合適応フレームワークであるAdaDocVQAについて、効果的な文書セグメンテーションのためのハイブリッドテキスト検索アーキテクチャ、マルチレベル検証による高品質な推論質問応答ペアを自動生成するインテリジェントデータ拡張パイプライン、動的構成生成と早期停止機構による適応アンサンブル推論の3つのコアイノベーションを通して紹介する。日本語文書VQAベンチマークの実験では、Yes/No質問では83.04\%、事実質問では52.66\%、JDocQAにおける数値質問では44.12\%、LAVAデータセットでは59.%の精度で大幅に改善されている。アブレーション研究は各コンポーネントから有意義な貢献を認め,本フレームワークは,他の低リソース言語や専門ドメインにスケーラブルな基盤を提供しながら,日本語文書VQAの新たな最先端結果を確立する。私たちのコードは、https://github.com/Haoxuanli-Thu/AdaDocVQA.comで公開しています。

論文の概要: AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings

関連論文リスト