ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios
- URL: http://arxiv.org/abs/2601.07280v1
- Date: Mon, 12 Jan 2026 07:36:06 GMT
- Title: ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios
- Authors: Changzai Pan, Jie Zhang, Kaiwen Wei, Chenshuo Pan, Yu Zhao, Jingwang Huang, Jian Yang, Zhenhe Wu, Haoyang Zeng, Xiaoyan Gu, Weichao Sun, Yanbo Zhai, Yujie Mao, Zhuoru Jiang, Jiang Zhong, Shuangyong Song, Yongxiang Li, Zhongjiang He,
- Abstract summary: We present ReasonTabQA, a large-scale bilingual benchmark encompassing 1,932 tables across 30 industry domains such as energy and automotive.<n>We also introduce TabCodeRL, a reinforcement learning method that leverages table-aware verifiable rewards to guide the generation of logical reasoning paths.
- Score: 42.9161992743627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in Large Language Models (LLMs) have significantly catalyzed table-based question answering (TableQA). However, existing TableQA benchmarks often overlook the intricacies of industrial scenarios, which are characterized by multi-table structures, nested headers, and massive scales. These environments demand robust table reasoning through deep structured inference, presenting a significant challenge that remains inadequately addressed by current methodologies. To bridge this gap, we present ReasonTabQA, a large-scale bilingual benchmark encompassing 1,932 tables across 30 industry domains such as energy and automotive. ReasonTabQA provides high-quality annotations for both final answers and explicit reasoning chains, supporting both thinking and no-thinking paradigms. Furthermore, we introduce TabCodeRL, a reinforcement learning method that leverages table-aware verifiable rewards to guide the generation of logical reasoning paths. Extensive experiments on ReasonTabQA and 4 TableQA datasets demonstrate that while TabCodeRL yields substantial performance gains on open-source LLMs, the persistent performance gap on ReasonTabQA underscores the inherent complexity of real-world industrial TableQA.
Related papers
- Enhancing TableQA through Verifiable Reasoning Trace Reward [38.96476258377461]
We introduce RE-Tab, a plug-and-play framework that architecturally enhances trajectory search via lightweight, training-free reward modeling.<n>We demonstrate that providing explicit verifiable rewards during State Transition (What is the best action?'') and Simulative Reasoning (Am I sure about the output?'') is crucial to steer the agent's navigation in table states.<n>A direct plug-and-play implementation of RE-Tab brings up to 41.77% improvement in QA accuracy and 33.33% drop in test-time inference samples for consistent answer.
arXiv Detail & Related papers (2026-01-30T04:06:42Z) - CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning [14.419739466403172]
Existing datasets for multimodal table understanding, such as MMTab, primarily provide short factual answers without explicit multi-step reasoning supervision.<n>We introduce CoReTab, a code-driven reasoning framework that produces scalable, interpretable, and automatically verifiable annotations.<n>We evaluate the resulting model trained on CoReTab across 17 MMTab benchmarks spanning table question answering, fact verification, and table structure understanding.
arXiv Detail & Related papers (2026-01-27T04:49:30Z) - When TableQA Meets Noise: A Dual Denoising Framework for Complex Questions and Large-scale Tables [20.33076921920799]
We propose EnoTab, a dual denoising framework for complex questions and large-scale tables.<n>We first perform Evidence-based Question Denoising by decomposing the question into minimal semantic units.<n>Then, we propose Evidence Tree-guided Table Denoising, which constructs an explicit and transparent table pruning path.
arXiv Detail & Related papers (2025-09-22T12:25:57Z) - T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables [65.12524437711737]
We propose the table-to-report task and construct a bilingual benchmark named T2R-bench.<n>The benchmark comprises 457 industrial tables, all derived from real-world scenarios and encompassing 19 industry domains.<n>Experiments on 25 widely-used LLMs reveal that even state-of-the-art models like Deepseek-R1 only achieves performance with 62.71 overall score.
arXiv Detail & Related papers (2025-08-27T11:55:40Z) - TabularGSM: Understanding the Limitations of LLMs in Tabular Math Reasoning [26.230588166759706]
We propose AutoT2T, a neuro-symbolic framework that transforms math word problems into scalable and verified tabular reasoning tasks.<n>We develop Tabular, a benchmark comprising three progressively complex subsets and a trap subset, with two complementary evaluation settings.
arXiv Detail & Related papers (2025-05-26T06:24:31Z) - RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking [63.253294691180635]
In real-world scenarios, beyond pure text, a substantial amount of knowledge is stored in tables.<n>We first propose a table-corpora-aware RAG framework, named T-RAG, which consists of the hierarchical memory index, multi-stage retrieval, and graph-aware prompting.
arXiv Detail & Related papers (2025-04-02T04:24:41Z) - Towards Question Answering over Large Semi-structured Tables [29.384514074911955]
TaDRe is a model that incorporates both pre- and post-table decomposition refinements to ensure table decomposition quality.<n>TaDRe achieves state-of-the-art performance on large-table TableQA tasks.
arXiv Detail & Related papers (2025-02-19T04:45:05Z) - TableBench: A Comprehensive and Complex Benchmark for Table Question Answering [33.64465594140019]
This paper investigates the application of Large Language Models (LLMs) in industrial scenarios.<n>We propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities.<n>Massive experiments conducted on TableBench indicate that both open-source and proprietary LLMs still have significant room for improvement to meet real-world demands.
arXiv Detail & Related papers (2024-08-17T11:40:10Z) - TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy [81.76462101465354]
We present a novel large vision-hugging model, TabPedia, equipped with a concept synergy mechanism.
This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering.
To better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA.
arXiv Detail & Related papers (2024-06-03T13:54:05Z) - Chain-of-Table: Evolving Tables in the Reasoning Chain for Table
Understanding [79.9461269253121]
We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts.
Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks.
arXiv Detail & Related papers (2024-01-09T07:46:26Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - MultiTabQA: Generating Tabular Answers for Multi-Table Question
Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page.
Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.