Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights
- URL: http://arxiv.org/abs/2505.19563v1
- Date: Mon, 26 May 2025 06:24:31 GMT
- Title: Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights
- Authors: Shi-Yu Tian, Zhi Zhou, Wei Dong, Ming Yang, Kun-Yang Yu, Zi-Jian Cheng, Lan-Zhe Guo, Yu-Feng Li,
- Abstract summary: We propose an automated generation pipeline AutoT2T that transforms mathematical word problems into table-based reasoning tasks.<n>The pipeline can generate multiple variants of a table for the same reasoning problem, including noisy versions to support robustness evaluation.<n> Experimental analyses through AutoT2T and TabularGSM reveal that the tight coupling between reasoning and retrieval or identification processes is a key factor underlying the failure of LLMs in complex Table QA tasks.
- Score: 45.5276155343796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reasoning with tabular data holds increasing importance in modern applications, yet comprehensive evaluation methodologies for reasoning-intensive Table Question Answering (QA) tasks remain nascent. Existing research is constrained by two primary bottlenecks: 1) Reliance on costly manually annotated real-world data, which is difficult to cover complex reasoning scenarios; 2) The heterogeneity of table structures hinders systematic analysis of the intrinsic mechanisms behind the underperformance of LLMs, especially in reasoning-intensive tasks. To address these issues, we propose an automated generation pipeline AutoT2T that transforms mathematical word problems into table-based reasoning tasks, eliminating the need for manual annotation. The pipeline can generate multiple variants of a table for the same reasoning problem, including noisy versions to support robustness evaluation. Based on this, we construct a new benchmark TabularGSM, which systematically spans a range of table complexities and trap problems. Experimental analyses through AutoT2T and TabularGSM reveal that the tight coupling between reasoning and retrieval or identification processes is a key factor underlying the failure of LLMs in complex Table QA tasks. This highlights the necessity for models to develop synergistic reasoning capabilities in order to perform effectively in complex Table QA tasks.
Related papers
- LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z) - TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension [8.489816179329832]
We present TQA-Bench, a new multi-table QA benchmark designed to evaluate the capabilities of large language models (LLMs) in tackling complex QA tasks over relational data.<n>Our benchmark incorporates diverse relational database instances sourced from real-world public datasets.<n>We systematically evaluate a range of LLMs, both open-source and closed-source, spanning model scales from 7 billion to 70 billion parameters.
arXiv Detail & Related papers (2024-11-29T06:48:13Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - Seek and Solve Reasoning for Table Question Answering [49.006950918895306]
This paper reveals that the reasoning process during task simplification may be more valuable than the simplified tasks themselves.<n>We propose a Seek-and-solving pipeline that instructs the LLM to first seek relevant information and then answer questions.<n>We distill a single-step TQA-solving prompt from this pipeline, using demonstrations with SS-CoT paths to guide the LLM in solving complex TQA tasks.
arXiv Detail & Related papers (2024-09-09T02:41:00Z) - H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables [56.73919743039263]
This paper introduces a novel algorithm that integrates both symbolic and semantic (textual) approaches in a two-stage process to address limitations.<n>Our experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three question-answering (QA) and fact-verification datasets.
arXiv Detail & Related papers (2024-06-29T21:24:19Z) - Exploring the Robustness of Language Models for Tabular Question Answering via Attention Analysis [7.486549276995143]
Large Language Models (LLMs) have been shown to tackle table comprehension tasks without specific training.<n>We probe how in-context learning (ICL), model scale, instruction tuning, and domain bias affect Tabular QA (TQA)<n>We reveal a strong correlation between perturbation-induced shifts in attention dispersion and the drops in performance.
arXiv Detail & Related papers (2024-06-18T15:41:15Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic.
We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios.
UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.