Rethinking Table Pruning in TableQA: From Sequential Revisions to Gold Trajectory-Supervised Parallel Search
- URL: http://arxiv.org/abs/2601.03851v1
- Date: Wed, 07 Jan 2026 12:08:59 GMT
- Title: Rethinking Table Pruning in TableQA: From Sequential Revisions to Gold Trajectory-Supervised Parallel Search
- Authors: Yu Guo, Shenghao Ye, Shuangwu Chen, Zijian Wen, Tao Zhang, Qirui Bai, Dong Jin, Yunpeng Hou, Huasen He, Jian Yang, Xiaobin Tan,
- Abstract summary: Table Question Answering (TableQA) benefits significantly from table pruning.<n>Existing table pruning methods rely on sequential revisions driven by unreliable critique signals.<n>We propose TabTrim, a novel table pruning framework which transforms table pruning from sequential revisions to gold trajectory-supervised parallel search.
- Score: 22.58777921256103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table Question Answering (TableQA) benefits significantly from table pruning, which extracts compact sub-tables by eliminating redundant cells to streamline downstream reasoning. However, existing pruning methods typically rely on sequential revisions driven by unreliable critique signals, often failing to detect the loss of answer-critical data. To address this limitation, we propose TabTrim, a novel table pruning framework which transforms table pruning from sequential revisions to gold trajectory-supervised parallel search. TabTrim derives a gold pruning trajectory using the intermediate sub-tables in the execution process of gold SQL queries, and trains a pruner and a verifier to make the step-wise pruning result align with the gold pruning trajectory. During inference, TabTrim performs parallel search to explore multiple candidate pruning trajectories and identify the optimal sub-table. Extensive experiments demonstrate that TabTrim achieves state-of-the-art performance across diverse tabular reasoning tasks: TabTrim-8B reaches 73.5% average accuracy, outperforming the strongest baseline by 3.2%, including 79.4% on WikiTQ and 61.2% on TableBench.
Related papers
- TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models [10.584052101655537]
TabTracer is an agentic framework that coordinates multi-step tool calls over intermediate table states.<n>It enforces step-level verification with typed operations and lightweight numeric and format checks.<n>It reduces redundancy with budget-aware pruning, deduplication, and state hashing with a monotonicity gate to cut token cost.
arXiv Detail & Related papers (2026-02-15T10:39:43Z) - TraceBack: Multi-Agent Decomposition for Fine-Grained Table Attribution [11.133753556671392]
TraceBack is a framework for scalable, cell-level attribution in single-table QA.<n>We release CITEBench, a benchmark with phrase-to-cell annotations drawn from ToTTo, FetaQA, and AITQA.<n>We also propose FairScore, a reference-less metric that compares atomic facts derived from predicted cells and answers to estimate attribution precision and recall without human cell labels.
arXiv Detail & Related papers (2026-02-13T16:13:36Z) - TabSieve: Explicit In-Table Evidence Selection for Tabular Prediction [14.270578219134997]
We propose TabSieve, a select-then-predict framework that makes evidence usage explicit and auditable.<n>Given a table and a query row, TabSieve first selects a small set of informative rows as evidence and then predicts the missing target conditioned on the selected evidence.<n>Experiments on a held-out benchmark of 75 classification and 52 regression tables show that TabSieve consistently improves performance across shot budgets.
arXiv Detail & Related papers (2026-02-12T08:28:58Z) - Enhancing TableQA through Verifiable Reasoning Trace Reward [38.96476258377461]
We introduce RE-Tab, a plug-and-play framework that architecturally enhances trajectory search via lightweight, training-free reward modeling.<n>We demonstrate that providing explicit verifiable rewards during State Transition (What is the best action?'') and Simulative Reasoning (Am I sure about the output?'') is crucial to steer the agent's navigation in table states.<n>A direct plug-and-play implementation of RE-Tab brings up to 41.77% improvement in QA accuracy and 33.33% drop in test-time inference samples for consistent answer.
arXiv Detail & Related papers (2026-01-30T04:06:42Z) - CORE-T: COherent REtrieval of Tables for Text-to-SQL [91.76918495375384]
CORE-T is a scalable, training-free framework that enriches tables with purpose metadata and pre-computes a lightweight table-compatibility cache.<n>Across Bird, Spider, and MMQA, CORE-T improves table-selection F1 by up to 22.7 points while retrieving up to 42% fewer tables.
arXiv Detail & Related papers (2026-01-19T14:51:23Z) - TabReX : Tabular Referenceless eXplainable Evaluation [15.411207072791806]
TabReX is a reference-less, property-driven framework for evaluating tables generated by large language models.<n>It computes interpretable, rubric-aware scores that quantify structural and factual fidelity.<n>To asses robustness, we introduce TabReX-Bench, a large-scale benchmark spanning six domains and twelve planner-driven perturbation types.
arXiv Detail & Related papers (2025-12-17T19:20:20Z) - SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats [0.0]
SQuARE is a hybrid retrieval framework with sheet-level, complexity-aware routing.<n>It computes a continuous score based on header depth and merge density.<n>SQuARE consistently surpasses single-strategy baselines and ChatGPT-4o on both retrieval precision and end-to-end answer accuracy.
arXiv Detail & Related papers (2025-12-03T22:11:45Z) - TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data [10.798423317852288]
TabDSR is a framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner.<n>We introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables.<n> Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and TabDSR, respectively
arXiv Detail & Related papers (2025-11-04T03:13:02Z) - REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval [46.38349148493421]
REAR (Retrieve, Expand and Refine) is a three-stage framework for efficient, high-fidelity multi-table retrieval.<n>Rear retrieves query-aligned tables, expands these with structurally joinable tables, and refines them by pruning noisy or weakly related candidates.<n>Rear is retriever-agnostic and consistently improves dense/sparse retrievers on complex table QA datasets.
arXiv Detail & Related papers (2025-11-02T05:01:04Z) - RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking [63.253294691180635]
In real-world scenarios, beyond pure text, a substantial amount of knowledge is stored in tables.<n>We first propose a table-corpora-aware RAG framework, named T-RAG, which consists of the hierarchical memory index, multi-stage retrieval, and graph-aware prompting.
arXiv Detail & Related papers (2025-04-02T04:24:41Z) - Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective [70.13748256886288]
Table retrieval is less explored compared to text retrieval.<n>Different table fields have varying matching preferences.<n>We introduce a Table-tailored HYbrid Matching rEtriever (THYME)
arXiv Detail & Related papers (2025-03-04T03:57:10Z) - SEMv3: A Fast and Robust Approach to Table Separation Line Detection [48.75713662571455]
Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image.
"Split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial.
We propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines.
arXiv Detail & Related papers (2024-05-20T08:13:46Z) - Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval [52.592071689901196]
We introduce a method that uncovers useful join relations for any query and database during table retrieval.<n>Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.
arXiv Detail & Related papers (2024-04-15T15:55:01Z) - TRUST: An Accurate and End-to-End Table structure Recognizer Using
Splitting-based Transformers [56.56591337457137]
We propose an accurate and end-to-end transformer-based table structure recognition method, referred to as TRUST.
Transformers are suitable for table structure recognition because of their global computations, perfect memory, and parallel computation.
We conduct experiments on several popular benchmarks including PubTabNet and SynthTable, our method achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-08-31T08:33:36Z) - End-to-End Table Question Answering via Retrieval-Augmented Generation [19.89730342792824]
We introduce T-RAG, an end-to-end Table QA model, where a non-parametric dense vector index is fine-tuned jointly with BART, a parametric sequence-to-sequence model to generate answer tokens.
Given any natural language question, T-RAG utilizes a unified pipeline to automatically search through a table corpus to directly locate the correct answer from the table cells.
arXiv Detail & Related papers (2022-03-30T23:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.