Improving Table Retrieval with Question Generation from Partial Tables
- URL: http://arxiv.org/abs/2508.06168v1
- Date: Fri, 08 Aug 2025 09:35:56 GMT
- Title: Improving Table Retrieval with Question Generation from Partial Tables
- Authors: Hsing-Ping Liang, Che-Wei Chang, Yao-Chung Fan,
- Abstract summary: We propose QGpT, a simple yet effective method that uses an LLM to generate synthetic questions based on small portions of a table.<n>The generated questions are then jointly embedded with the partial table segments used for generation, enhancing semantic alignment with user queries.
- Score: 2.2169618382995764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in open-domain question answering over tables have widely adopted large language models (LLMs) under the Retriever-Reader architecture. Prior works have effectively leveraged LLMs to tackle the complex reasoning demands of the Reader component, such as text-to-text, text-to-SQL, and multi hop reasoning. In contrast, the Retriever component has primarily focused on optimizing the query representation-training retrievers to retrieve relevant tables based on questions, or to select keywords from questions for matching table segments. However, little attention has been given to enhancing how tables themselves are represented in embedding space to better align with questions. To address this, we propose QGpT (Question Generation from Partial Tables), a simple yet effective method that uses an LLM to generate synthetic questions based on small portions of a table. These questions are generated to simulate how a user might query the content of the table currently under consideration. The generated questions are then jointly embedded with the partial table segments used for generation, enhancing semantic alignment with user queries. Without the need to embed entire tables, our method significantly improves retrieval performance across multiple benchmarks for both dense and late-interaction retrievers.
Related papers
- CORE-T: COherent REtrieval of Tables for Text-to-SQL [91.76918495375384]
CORE-T is a scalable, training-free framework that enriches tables with purpose metadata and pre-computes a lightweight table-compatibility cache.<n>Across Bird, Spider, and MMQA, CORE-T improves table-selection F1 by up to 22.7 points while retrieving up to 42% fewer tables.
arXiv Detail & Related papers (2026-01-19T14:51:23Z) - A Hybrid Search for Complex Table Question Answering in Securities Report [0.9430947207126281]
We propose a cell extraction method for Table Question Answering (TQA) without manual identification.<n>Our approach estimates table headers by computing similarities between a given question and individual cells.<n>We then select as the answer the cells at the intersection of the most relevant row and column.
arXiv Detail & Related papers (2025-11-12T10:19:27Z) - REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval [46.38349148493421]
REAR (Retrieve, Expand and Refine) is a three-stage framework for efficient, high-fidelity multi-table retrieval.<n>Rear retrieves query-aligned tables, expands these with structurally joinable tables, and refines them by pruning noisy or weakly related candidates.<n>Rear is retriever-agnostic and consistently improves dense/sparse retrievers on complex table QA datasets.
arXiv Detail & Related papers (2025-11-02T05:01:04Z) - Improving Table Understanding with LLMs and Entity-Oriented Search [24.3302301035859]
We introduce an entity-oriented search method to improve table understanding with large language models (LLMs)<n>This approach effectively leverages the semantic similarities between questions and table data, as well as the implicit relationships between table cells.<n>It focuses on table entities, ensuring that table cells are semantically tightly bound, thereby enhancing contextual clarity.
arXiv Detail & Related papers (2025-08-23T14:02:45Z) - Weaver: Interweaving SQL and LLM for Table Reasoning [63.09519234853953]
Weaver generates a flexible, step-by-step plan that combinessql for structured data retrieval with LLMs for semantic processing.<n>Weaver consistently outperforms state-of-the-art methods across four TableQA datasets, reducing both API calls and error rates.
arXiv Detail & Related papers (2025-05-25T03:27:37Z) - Bridging Queries and Tables through Entities in Table Retrieval [70.13748256886288]
Entities are well-studied in the context of text retrieval, but there is a noticeable lack of research on their applications in table retrieval.<n>We propose an entity-enhanced training framework and design an interaction paradigm based on entity representations.<n>Our proposed framework is plug-and-play and flexible, making it easy to integrate into existing table retriever training processes.
arXiv Detail & Related papers (2025-04-09T03:16:33Z) - RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking [63.253294691180635]
In real-world scenarios, beyond pure text, a substantial amount of knowledge is stored in tables.<n>We first propose a table-corpora-aware RAG framework, named T-RAG, which consists of the hierarchical memory index, multi-stage retrieval, and graph-aware prompting.
arXiv Detail & Related papers (2025-04-02T04:24:41Z) - Piece of Table: A Divide-and-Conquer Approach for Selecting Subtables in Table Question Answering [20.926770550682964]
PieTa is a new framework for subtable-based question answering (QA)<n>It operates through an iterative process of dividing tables into smaller windows, using LMs to select relevant cells within each window, and merging these cells into a subtable.<n>It demonstrates improved performance over previous subtable-based QA approaches.
arXiv Detail & Related papers (2024-12-10T16:08:14Z) - Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval [52.592071689901196]
We introduce a method that uncovers useful join relations for any query and database during table retrieval.<n>Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.
arXiv Detail & Related papers (2024-04-15T15:55:01Z) - Augment before You Try: Knowledge-Enhanced Table Question Answering via
Table Expansion [57.53174887650989]
Table question answering is a popular task that assesses a model's ability to understand and interact with structured data.
Existing methods either convert both the table and external knowledge into text, which neglects the structured nature of the table.
We propose a simple yet effective method to integrate external information in a given table.
arXiv Detail & Related papers (2024-01-28T03:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.