Fugu-MT 論文翻訳(概要): Hierarchical structure understanding in complex tables with VLLMs: a benchmark and experiments

論文の概要: Hierarchical structure understanding in complex tables with VLLMs: a benchmark and experiments

arxiv url: http://arxiv.org/abs/2511.08298v1
Date: Wed, 12 Nov 2025 01:51:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-12 20:17:03.747808
Title: Hierarchical structure understanding in complex tables with VLLMs: a benchmark and experiments
Title（参考訳）: VLLMを持つ複素テーブルにおける階層構造理解--ベンチマークと実験
Authors: Luca Bindini, Simone Giovannini, Simone Marinai, Valeria Nardoni, Kimiya Noor Ali,
Abstract要約: 本研究では,VLLM(Vision Large Language Models)の科学論文における表の構造を理解し,解釈する能力について検討する。実験の基盤として、大規模な科学表のコーパスであるPubTables-1Mデータセットを使用しました。モデルの理解能力を探索し、様々なプロンプトフォーマットや書き方を試すために、一連のプロンプトエンジニアリング戦略を採用しています。また、評価されたVLLMの性能と比較し、小さなテーブルの集合上でタスクを解くための人間のパフォーマンスを測定した。
参考スコア（独自算出の注目度）: 1.226598527858578
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work investigates the ability of Vision Large Language Models (VLLMs) to understand and interpret the structure of tables in scientific articles. Specifically, we explore whether VLLMs can infer the hierarchical structure of tables without additional processing. As a basis for our experiments we use the PubTables-1M dataset, a large-scale corpus of scientific tables. From this dataset, we extract a subset of tables that we introduce as Complex Hierarchical Tables (CHiTab): a benchmark collection of complex tables containing hierarchical headings. We adopt a series of prompt engineering strategies to probe the models' comprehension capabilities, experimenting with various prompt formats and writing styles. Multiple state-of-the-art open-weights VLLMs are evaluated on the benchmark first using their off-the-shelf versions and then fine-tuning some models on our task. We also measure the performance of humans to solve the task on a small set of tables comparing with performance of the evaluated VLLMs. The experiments support our intuition that generic VLLMs, not explicitly designed for understanding the structure of tables, can perform this task. This study provides insights into the potential and limitations of VLLMs to process complex tables and offers guidance for future work on integrating structured data understanding into general-purpose VLLMs.
Abstract（参考訳）: 本研究では,VLLM(Vision Large Language Models)の科学論文における表の構造を理解し,解釈する能力について検討する。具体的には、VLLMが追加処理なしでテーブルの階層構造を推測できるかどうかを検討する。実験の基盤として、大規模な科学表のコーパスであるPubTables-1Mデータセットを使用しました。このデータセットから、複素階層テーブル(CHiTab)と呼ばれるテーブルのサブセットを抽出する。モデルの理解能力を探索し、様々なプロンプトフォーマットや書き方を試すために、一連のプロンプトエンジニアリング戦略を採用しています。複数の最先端のオープンウェイトVLLMを、まずオフ・ザ・シェルフバージョンを使用してベンチマークで評価し、その後、タスク上のいくつかのモデルを微調整します。また、評価されたVLLMの性能と比較し、小さなテーブルの集合上でタスクを解くための人間のパフォーマンスを測定した。実験は、テーブルの構造を理解するために明示的に設計されていない汎用的なVLLMが、このタスクを実行できるという我々の直感を支持する。本研究は、複雑なテーブルを処理するためのVLLMの可能性と限界についての洞察を提供し、構造化されたデータ理解を汎用的なVLLMに統合するためのガイダンスを提供する。

論文の概要: Hierarchical structure understanding in complex tables with VLLMs: a benchmark and experiments

関連論文リスト