Fugu-MT 論文翻訳(概要): DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

論文の概要: DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

arxiv url: http://arxiv.org/abs/2604.16099v1
Date: Fri, 17 Apr 2026 14:33:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.952417
Title: DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates
Title（参考訳）: DenTab: テーブル認識とビジュアルQAのための実世界歯科用推定データ
Authors: Laziz Hamdi, Amine Tamasna, Thierry Paquet,
Abstract要約: DenTabは、高品質なHTMLアノテーションを備えた歯科用推定値から2000個のトリミングされたテーブルイメージのデータセットである。我々は、14の視覚言語エグゼキュータモデル(VLM)と2つのOCRベースラインを含む16のシステムをベンチマークした。本稿では,算術的質問を決定論的実行にルーティングするテーブルルータパイプラインを提案する。
参考スコア（独自算出の注目度）: 2.7885016877286897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tables condense key transactional and administrative information into compact layouts, but practical extraction requires more than text recognition: systems must also recover structure (rows, columns, merged cells, headers) and interpret roles such as line items, subtotals, and totals under common capture artifacts. Many existing resources for table structure recognition and TableVQA are built from clean digital-born sources or rendered tables, and therefore only partially reflect noisy administrative conditions. We introduce DenTab, a dataset of 2{,}000 cropped table images from dental estimates with high-quality HTML annotations, enabling evaluation of table recognition (TR) and table visual question answering (TableVQA) on the same inputs. DenTab includes 2{,}208 questions across eleven categories spanning retrieval, aggregation, and logic/consistency checks. We benchmark 16 systems, including 14 vision--language models (VLMs) and two OCR baselines. Across models, strong structure recovery does not consistently translate into reliable performance on multi-step arithmetic and consistency questions, and these reasoning failures persist even when using ground-truth HTML table inputs. To improve arithmetic reliability without training, we propose the Table Router Pipeline, which routes arithmetic questions to deterministic execution. The pipeline combines (i) a VLM that produces a baseline answer, a structured table representation, and a constrained table program with (ii) a rule-based executor that performs exact computation over the parsed table. The source code and dataset will be made publicly available at https://github.com/hamdilaziz/DenTab.
Abstract（参考訳）: テーブルはキートランザクショナルおよび管理情報をコンパクトなレイアウトに集約するが、実際の抽出にはテキスト認識以上のものが必要であり、システムは構造(行、列、マージセル、ヘッダー)を復元し、ラインアイテム、サブトゥータル、トータルといった役割を共通のキャプチャーアーティファクトの下で解釈する必要がある。テーブル構造認識やテーブルVQAのための既存のリソースの多くは、クリーンなデジタル生まれのソースやレンダリングされたテーブルから構築されているため、ノイズの多い管理条件を部分的に反映しているだけである。高品質なHTMLアノテーションを用いた2{,}000個のトリミングテーブル画像のデータセットであるDenTabを導入し、同じ入力に対してテーブル認識(TR)とテーブル視覚質問応答(TableVQA)の評価を可能にする。 DenTabには、検索、集約、ロジック/一貫性チェックにまたがる11のカテゴリにわたる2{,}208の質問が含まれている。我々は、14の視覚言語モデル(VLM)と2つのOCRベースラインを含む16のシステムをベンチマークした。モデル全体では、強い構造回復は、マルチステップの算術と整合性の問題における信頼性の高い性能に一貫して変換されない。学習せずに算術的信頼性を向上させるために,算術的質問を決定論的実行にルーティングするテーブルルータパイプラインを提案する。パイプラインが結合します (i)ベースライン回答、構造化テーブル表現及び制約テーブルプログラムを生成するVLM (ii) 解析テーブル上で正確な計算を行うルールベースの実行器。ソースコードとデータセットはhttps://github.com/hamdilaziz/DenTab.comで公開されている。

論文の概要: DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

関連論文リスト