Related papers: Towards Question Answering over Large Semi-structured Tables

Towards Question Answering over Large Semi-structured Tables

URL: http://arxiv.org/abs/2502.13422v2
Date: Mon, 04 Aug 2025 11:27:55 GMT
Title: Towards Question Answering over Large Semi-structured Tables
Authors: Yuxiang Wang, Junhao Gan, Jianzhong Qi,
Abstract summary: TaDRe is a model that incorporates both pre- and post-table decomposition refinements to ensure table decomposition quality.<n>TaDRe achieves state-of-the-art performance on large-table TableQA tasks.
Score: 29.384514074911955
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Table Question Answering (TableQA) attracts strong interests due to the prevalence of web information presented in the form of semi-structured tables. Despite many efforts, TableQA over large tables remains an open challenge. This is because large tables may overwhelm models that try to comprehend them in full to locate question answers. Recent studies reduce input table size by decomposing tables into smaller, question-relevant sub-tables via generating programs to parse the tables. However, such solutions are subject to program generation and execution errors and are difficult to ensure decomposition quality. To address this issue, we propose TaDRe, a TableQA model that incorporates both pre- and post-table decomposition refinements to ensure table decomposition quality, hence achieving highly accurate TableQA results. To evaluate TaDRe, we construct two new large-table TableQA benchmarks via LLM-driven table expansion and QA pair generation. Extensive experiments on both the new and public benchmarks show that TaDRe achieves state-of-the-art performance on large-table TableQA tasks.

Related papers

Piece of Table: A Divide-and-Conquer Approach for Selecting Subtables in Table Question Answering [20.926770550682964]
PieTa is a new framework for subtable-based question answering (QA)<n>It operates through an iterative process of dividing tables into smaller windows, using LMs to select relevant cells within each window, and merging these cells into a subtable.<n>It demonstrates improved performance over previous subtable-based QA approaches.
arXiv Detail & Related papers (2024-12-10T16:08:14Z)
TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z)
SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA [25.09488366689108]
Text-to- parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. We identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets.
arXiv Detail & Related papers (2024-09-25T07:18:45Z)
KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA. Every question requires the integration of information from both the table and the sub-graph to be answered. We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z)
TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition [6.253771639590562]
Table reasoning is a challenging task that requires understanding both natural language questions and structured data. We propose Tabify, a novel method that leverages text-to-generation to decompose tables into smaller and relevant sub-tables. Our method performs remarkably well on the WikiTQ benchmark, achieving an accuracy of 64.7%.
arXiv Detail & Related papers (2024-04-15T21:42:20Z)
Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval [52.592071689901196]
We introduce a method that uncovers useful join relations for any query and database during table retrieval. Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.
arXiv Detail & Related papers (2024-04-15T15:55:01Z)
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations [13.900589860309488]
RobuT builds upon existing Table QA datasets (WTQ, Wiki-Weak, and SQA) Our results indicate that both state-of-the-art Table QA models and large language models (e.g., GPT-3) with few-shot learning falter in these adversarial sets. We propose to address this problem by using large language models to generate adversarial examples to enhance training.
arXiv Detail & Related papers (2023-06-25T19:23:21Z)
MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering [61.48881995121938]
Real-world queries are complex in nature, often over multiple tables in a relational database or web page. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers.
arXiv Detail & Related papers (2023-05-22T08:25:15Z)
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering [106.73213656603453]
We develop a simple table-based QA model with minimal annotation effort. We propose an omnivorous pretraining approach that consumes both natural and synthetic data.
arXiv Detail & Related papers (2022-07-08T01:23:45Z)
Table Retrieval May Not Necessitate Table-specific Model Design [83.27735758203089]
We focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval?" Based on an analysis on a table-based portion of the Natural Questions dataset (NQ-table), we find that structure plays a negligible role in more than 70% of the cases. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. None of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
arXiv Detail & Related papers (2022-05-19T20:35:23Z)
TableFormer: Robust Transformer Modeling for Table-Text Encoding [18.00127368618485]
Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias. In this work, we propose a robust and structurally aware table-text encoding architecture TableFormer.
arXiv Detail & Related papers (2022-03-01T07:23:06Z)
CLTR: An End-to-End, Transformer-Based System for Cell Level Table Retrieval and Table Question Answering [8.389189333083513]
We present the first end-to-end, transformer-based table question answering (QA) system. It takes natural language questions and massive table corpus as inputs to retrieve the most relevant tables and locate the correct table cells to answer the question. We introduce two new open-domain benchmarks, E2E_WTQ and E2E_GNQ, consisting of 2,005 natural language questions over 76,242 tables.
arXiv Detail & Related papers (2021-06-08T15:22:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.