TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models
- URL: http://arxiv.org/abs/2602.14089v1
- Date: Sun, 15 Feb 2026 10:39:43 GMT
- Title: TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models
- Authors: Zhizhao Luo, Zhaojing Luo, Meihui Zhang, Rui Mao,
- Abstract summary: TabTracer is an agentic framework that coordinates multi-step tool calls over intermediate table states.<n>It enforces step-level verification with typed operations and lightweight numeric and format checks.<n>It reduces redundancy with budget-aware pruning, deduplication, and state hashing with a monotonicity gate to cut token cost.
- Score: 10.584052101655537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have emerged as powerful tools for natural language table reasoning, where there are two main categories of methods. Prompt-based approaches rely on language-only inference or one-pass program generation without step-level verification. Agent-based approaches use tools in a closed loop, but verification is often local and backtracking is limited, allowing errors to propagate and increasing cost. Moreover, they rely on chain- or beam-style trajectories that are typically combinatorially redundant, leading to high token costs. In this paper, we propose TabTracer, an agentic framework that coordinates multi-step tool calls over intermediate table states, with explicit state tracking for verification and rollback. First, it enforces step-level verification with typed operations and lightweight numeric and format checks to provide reliable rewards and suppress hallucinations. Second, execution-feedback Monte Carlo Tree Search maintains a search tree of candidate table states and uses backpropagated reflection scores to guide UCB1 selection and rollback via versioned snapshots. Third, it reduces redundancy with budget-aware pruning, deduplication, and state hashing with a monotonicity gate to cut token cost. Comprehensive evaluation on TabFact, WikiTQ, and CRT datasets shows that TabTracer outperforms state-of-the-art baselines by up to 6.7% in accuracy while reducing token consumption by 59--84%.
Related papers
- TabAgent: A Framework for Replacing Agentic Generative Components with Tabular-Textual Classifiers [5.792704492773729]
TabAgent is a framework for replacing generative decision components in closed-set selection tasks with a compact textual-tabular classifier trained on execution traces.<n>On the long-horizon AppWorld benchmark, TabAgent maintains task-level success while eliminating shortlist-time LLM calls, reducing latency by approximately 95% and inference cost by 85-91%.
arXiv Detail & Related papers (2026-02-18T13:01:17Z) - Col-Bandit: Zero-Shot Query-Time Pruning for Late-Interaction Retrieval [2.159285655678094]
Col-Bandit is a query-time pruning algorithm that reduces this computational burden by casting reranking as a finite-population Top-$K$ identification problem.<n>Unlike coarse-grained approaches that prune entire documents or tokens offline, Col-Bandit sparsifies the interaction matrix on the fly.<n>Experiments show that Col-Bandit preserves ranking fidelity while reducing MaxSim FLOPs by up to 5$times$.
arXiv Detail & Related papers (2026-02-02T21:27:01Z) - Reasoning by Commented Code for Table Question Answering [2.497926557563177]
Table Question Answering (TableQA) poses a significant challenge for large language models.<n>Existing methods, which depend on end-to-end answer generation or single-line program queries, exhibit limited numerical accuracy and reduced interpretability.<n>This work introduces a commented, step-by-step code-generation framework that incorporates explicit reasoning into the Python program-generation process.
arXiv Detail & Related papers (2026-01-31T06:16:35Z) - TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG [71.06073770344732]
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval.<n>We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining outcome-only rewards.
arXiv Detail & Related papers (2026-01-11T14:07:30Z) - Rethinking Table Pruning in TableQA: From Sequential Revisions to Gold Trajectory-Supervised Parallel Search [22.58777921256103]
Table Question Answering (TableQA) benefits significantly from table pruning.<n>Existing table pruning methods rely on sequential revisions driven by unreliable critique signals.<n>We propose TabTrim, a novel table pruning framework which transforms table pruning from sequential revisions to gold trajectory-supervised parallel search.
arXiv Detail & Related papers (2026-01-07T12:08:59Z) - TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z) - TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning [77.01182934427095]
TaTToo is a novel table-grounded PRM framework that integrates tool-based verification to provide precise reward supervision.<n>We train TaTToo with a dual-stage paradigm: cold-start supervised fine-tuning to capture tool-use reasoning patterns, followed by reinforcement learning to align our model with table-based verification.
arXiv Detail & Related papers (2025-10-07T17:59:41Z) - GraphRunner: A Multi-Stage Framework for Efficient and Accurate Graph-Based Retrieval [3.792463570467098]
GraphRunner is a novel graph-based retrieval framework that operates in three distinct stages: planning, verification, and execution.<n>It significantly reduces reasoning errors and detects hallucinations before execution.<n>Our evaluation using the GRBench dataset shows that GraphRunner consistently outperforms existing approaches.
arXiv Detail & Related papers (2025-07-11T18:10:01Z) - Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z) - From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval [22.35942074715463]
Chain-of-Thought (CoT) prompting enables complex reasoning in large language models (LLMs)<n>We propose State Machine Reasoning (SMR), a transition-based reasoning framework composed of discrete actions.<n> Experiments on the BEIR and BRIGHT benchmarks show that SMR improves retrieval performance (nDCG@10) by 3.4% while reducing token usage by 74.4%.
arXiv Detail & Related papers (2025-05-29T04:04:25Z) - T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search [51.91311158085973]
multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification.<n>We propose T2Agent, a novel misinformation detection agent that incorporates a toolkit with Monte Carlo Tree Search.<n>Extensive experiments show that T2Agent consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks.
arXiv Detail & Related papers (2025-05-26T09:50:55Z) - Multilingual Autoregressive Entity Linking [49.35994386221958]
mGENRE is a sequence-to-sequence system for the Multilingual Entity Linking problem.
For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token.
We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks.
arXiv Detail & Related papers (2021-03-23T13:25:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.