Related papers: TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

URL: http://arxiv.org/abs/2512.20312v2
Date: Thu, 25 Dec 2025 12:35:54 GMT
Title: TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning
Authors: Saisai Yang, Qingyi Huang, Jing Yuan, Liangyu Zha, Kai Tang, Yuhang Yang, Ning Wang, Yucheng Wei, Liyao Li, Wentao Ye, Hao Chen, Tao Zhang, Junlin Zhou, Haobo Wang, Gang Chen, Junbo Zhao,
Abstract summary: TableGPT-R1 is a specialized model built on a systematicReinforcement Learning framework.<n>Our approach synthesizes difficulty-stratified agentic trajectories for both supervised alignment and RL rollouts.<n>It achieves state-of-the-art performance on authoritative benchmarks.
Score: 28.052232941379884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such structured data, they often fall short in handling the complex, multi-step reasoning and robust code execution required for real-world table tasks. Reinforcement Learning (RL) offers a promising avenue to enhance these capabilities, yet its application in the tabular domain faces three critical hurdles: the scarcity of high-quality agentic trajectories with closed-loop code execution and environment feedback on diverse table structures, the extreme heterogeneity of feedback signals ranging from rigid SQL execution to open-ended data interpretation, and the risk of catastrophic forgetting of general knowledge during vertical specialization. To overcome these challenges and unlock advanced reasoning on complex tables, we introduce \textbf{TableGPT-R1}, a specialized tabular model built on a systematic RL framework. Our approach integrates a comprehensive data engineering pipeline that synthesizes difficulty-stratified agentic trajectories for both supervised alignment and RL rollouts, a task-adaptive reward system that combines rule-based verification with a criteria-injected reward model and incorporates process-level step reward shaping with behavioral regularization, and a multi-stage training framework that progressively stabilizes reasoning before specializing in table-specific tasks. Extensive evaluations demonstrate that TableGPT-R1 achieves state-of-the-art performance on authoritative benchmarks, significantly outperforming baseline models while retaining robust general capabilities. Our model is available at https://huggingface.co/tablegpt/TableGPT-R1.

Related papers

Table2LaTeX-RL: High-Fidelity LaTeX Code Generation from Table Images via Reinforced Multimodal Language Models [53.03670032402846]
We address the task of table image to code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs.<n>A central challenge of this task lies in accurately handling complex tables -- those with large sizes, deeply nested structures, and semantically rich or irregular cell content.<n>We propose a reinforced multimodal large language model (MLLM) framework, where a pre-trained MLLM is fine-tuned on a large-scale table-to-La dataset.
arXiv Detail & Related papers (2025-09-22T11:13:48Z)
Can GRPO Boost Complex Multimodal Table Understanding? [41.72642230279542]
Table-R1 is a three-stage reinforcement learning framework for multimodal table understanding.<n>It can boost the model's table reasoning performance obviously on both held-in and held-out datasets.
arXiv Detail & Related papers (2025-09-21T02:51:15Z)
TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning [10.267950603662776]
TableMind is a tool-integrated table reasoning agent that autonomously performs multi-turn tool invocation, writes and executes code in a secure sandbox environment for data analysis and precise numerical reasoning.<n>To realize these capabilities, we adopt a two-stage fine-tuning paradigm built on top of a powerful pre-trained language model.
arXiv Detail & Related papers (2025-09-08T02:00:31Z)
A Pre-training Framework for Relational Data with Information-theoretic Principles [57.93973948947743]
We introduce Task Vector Estimation (TVE), a novel pre-training framework that constructs supervisory signals via set-based aggregation over relational graphs.<n>TVE consistently outperforms traditional pre-training baselines.<n>Our findings advocate for pre-training objectives that encode task heterogeneity and temporal structure as design principles for predictive modeling on relational databases.
arXiv Detail & Related papers (2025-07-14T00:17:21Z)
Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models [52.94091440130039]
Table reasoning (TR) requires structured reasoning over semi-structured data.<n>Small language models (SLMs) have limited capacity compared to large LMs (LLMs, e.g., GPT-4o)<n>We propose program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR) by generating executable programs.<n>Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods.
arXiv Detail & Related papers (2025-06-06T14:52:19Z)
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning [69.27153114778748]
We introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations.<n>This structured approach allows Large Language Models (LLMs) to generate and executesql queries, enhancing generalization and mitigating biases.
arXiv Detail & Related papers (2025-06-06T05:14:04Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
Table-R1: Inference-Time Scaling for Table Reasoning [56.812846737424245]
We develop and evaluate two post-training strategies to enable inference-time scaling.<n>For distillation, we introduce a large-scale dataset of reasoning traces generated by DeepSeek-R1.<n>For RLVR, we propose task-specific verifiable reward functions and apply the GRPO algorithm to obtain the Table-R1-Zero model.
arXiv Detail & Related papers (2025-05-29T16:28:50Z)
TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z)
Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients [20.941908494137806]
We propose a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery.<n>Our work is aligned with the popular DSR framework which focuses on learning a data-specific expression generator.
arXiv Detail & Related papers (2024-06-10T19:29:10Z)
LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training [45.80561537971478]
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. We model TSR as a logical location regression problem and propose a new TSR framework called LORE. Our proposed LORE is conceptually simpler, easier to train, and more accurate than other paradigms of TSR.
arXiv Detail & Related papers (2024-01-03T03:14:55Z)
Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training Framework [5.351873055148804]
Self-training framework generates diverse synthetic data with complex logic. We optimize the procedure using a "Table-Text Manipulator" to handle joint table-text reasoning scenarios. UCTRST achieves above 90% of the supervised model performance on different tasks and domains.
arXiv Detail & Related papers (2022-12-20T09:15:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.