Same Content, Different Representations: A Controlled Study for Table QA
- URL: http://arxiv.org/abs/2509.22983v1
- Date: Fri, 26 Sep 2025 22:33:19 GMT
- Title: Same Content, Different Representations: A Controlled Study for Table QA
- Authors: Yue Zhang, Seiji Maekawa, Nikita Bhutani,
- Abstract summary: Table Question Answering (Table QA) in real-world settings must operate over both structured databases and semi-structured tables containing textual fields.<n>Existing benchmarks are tied to fixed data formats and have not systematically examined how representation itself affects model performance.<n>We present the first controlled study that isolates the role of table representation by holding content constant while varying structure.
- Score: 15.896655757672441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table Question Answering (Table QA) in real-world settings must operate over both structured databases and semi-structured tables containing textual fields. However, existing benchmarks are tied to fixed data formats and have not systematically examined how representation itself affects model performance. We present the first controlled study that isolates the role of table representation by holding content constant while varying structure. Using a verbalization pipeline, we generate paired structured and semi-structured tables, enabling direct comparisons across modeling paradigms. To support detailed analysis, we introduce a diagnostic benchmark with splits along table size, join requirements, query complexity, and schema quality. Our experiments reveal consistent trade-offs: SQL-based methods achieve high accuracy on structured inputs but degrade on semi-structured data, LLMs exhibit flexibility but reduced precision, and hybrid approaches strike a balance, particularly under noisy schemas. These effects intensify with larger tables and more complex queries. Ultimately, no single method excels across all conditions, and we highlight the central role of representation in shaping Table QA performance. Our findings provide actionable insights for model selection and design, paving the way for more robust hybrid approaches suited for diverse real-world data formats.
Related papers
- ST-Raptor: An Agentic System for Semi-Structured Table QA [16.18235560779917]
We present ST-Raptor, an agentic system for semi-structured table question answering (QA)<n> ST-Raptor offers an interactive analysis environment that combines visual editing, tree-based structural modeling, and agent-driven query resolution to support accurate and user-friendly table understanding.
arXiv Detail & Related papers (2026-02-03T09:06:21Z) - TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding [52.59372043981724]
TableDART is a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models.<n>In addition, we propose a novel agent to cross-modal knowledge integration by analyzing outputs from text- and image-based models.
arXiv Detail & Related papers (2025-09-18T07:00:13Z) - Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance [8.83042313837811]
We propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths.<n>Given a natural language query, our method searches on graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies.<n>Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-06-04T20:21:52Z) - RelDiff: Relational Data Generative Modeling with Graph-Based Diffusion Models [83.6013616017646]
RelDiff is a novel diffusion generative model that synthesizes complete relational databases by explicitly modeling their foreign key graph structure.<n>RelDiff consistently outperforms prior methods in producing realistic and coherent synthetic relational databases.
arXiv Detail & Related papers (2025-05-31T21:01:02Z) - Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering [16.790216473975146]
We conduct the first controlled study on the effectiveness of several combinations of table representations and models from two perspectives.<n>We find that the best combination of table representation and model varies across setups.<n>We propose FRES, a method selecting table representations dynamically, and observe a 10% average performance improvement.
arXiv Detail & Related papers (2025-05-20T09:36:17Z) - Better Think with Tables: Tabular Structures Enhance LLM Comprehension for Data-Analytics Requests [33.471112091886894]
Large Language Models (LLMs) often struggle with data-analytics requests related to information retrieval and data manipulation.<n>We introduce Thinking with Tables, where we inject tabular structures into LLMs for data-analytics requests.<n>We show that providing tables yields a 40.29 percent average performance gain along with better manipulation and token efficiency.
arXiv Detail & Related papers (2024-12-22T23:31:03Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - SEMv2: Table Separation Line Detection Based on Instance Segmentation [96.36188168694781]
We propose an accurate table structure recognizer, termed SEMv2 (SEM: Split, Embed and Merge)
We address the table separation line instance-level discrimination problem and introduce a table separation line detection strategy based on conditional convolution.
To comprehensively evaluate the SEMv2, we also present a more challenging dataset for table structure recognition, dubbed iFLYTAB.
arXiv Detail & Related papers (2023-03-08T05:15:01Z) - Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic.
We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios.
UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z) - Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?"
Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases.
We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases.
None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.