Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models
- URL: http://arxiv.org/abs/2505.23667v2
- Date: Sat, 31 May 2025 07:32:09 GMT
- Title: Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models
- Authors: Lang Cao, Jingxian Xu, Hanbing Liu, Jinyu Wang, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang,
- Abstract summary: We propose Formula Tuning, a reinforcement learning framework that trains LMs to generate executable spreadsheet formulas.<n>Formula Tuning reduces the reliance on supervised formula annotations by using binary answer correctness as a reward signal.<n>It substantially enhances LM performance, particularly on multi-step numerical and symbolic reasoning tasks.
- Score: 44.340292033316715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tables are a fundamental structure for organizing and analyzing data, making effective table understanding a critical capability for intelligent systems. While large language models (LMs) demonstrate strong general reasoning abilities, they continue to struggle with accurate numerical or symbolic reasoning over tabular data, especially in complex scenarios. Spreadsheet formulas provide a powerful and expressive medium for representing executable symbolic operations, encoding rich reasoning patterns that remain largely underutilized. In this paper, we propose Formula Tuning (Fortune), a reinforcement learning (RL) framework that trains LMs to generate executable spreadsheet formulas for question answering over general tabular data. Formula Tuning reduces the reliance on supervised formula annotations by using binary answer correctness as a reward signal, guiding the model to learn formula derivation through reasoning. We provide a theoretical analysis of its advantages and demonstrate its effectiveness through extensive experiments on seven table reasoning benchmarks. Formula Tuning substantially enhances LM performance, particularly on multi-step numerical and symbolic reasoning tasks, enabling a 7B model to outperform OpenAI o1 on table understanding. This highlights the potential of formula-driven RL to advance symbolic table reasoning in LMs.
Related papers
- Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z) - Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning [24.624844234355734]
Reasoning-Table is the first application of reinforcement learning (RL) to table reasoning, achieving state-of-the-art performance.<n> Reasoning-Table emerges as a robust table reasoning large language model, surpassing larger proprietary models like Claude-3.7-Sonnet by 4.0%.
arXiv Detail & Related papers (2025-06-02T14:18:09Z) - TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction [19.350413252699042]
Large language models (LLMs) have demonstrated powerful capabilities to generate human-like reasoning and explanations.<n>We propose a new approach that leverages reasoning-based LLMs, trained using reinforcement learning, to perform more accurate and explainable predictions.<n>Our method introduces custom reward functions that guide the model not only toward high prediction accuracy but also toward human-understandable reasons for its predictions.
arXiv Detail & Related papers (2025-05-27T22:23:11Z) - TableLoRA: Low-rank Adaptation on Table Structure Understanding for Large Language Models [57.005158277893194]
TableLoRA is a module designed to improve LLMs' understanding of table structure during PEFT.<n>It incorporates special tokens for serializing tables with special token encoder and uses 2D LoRA to encode low-rank information on cell positions.
arXiv Detail & Related papers (2025-03-06T12:50:14Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables [56.73919743039263]
This paper introduces a novel algorithm that integrates both symbolic and semantic (textual) approaches in a two-stage process to address limitations.<n>Our experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three question-answering (QA) and fact-verification datasets.
arXiv Detail & Related papers (2024-06-29T21:24:19Z) - NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization [6.253771639590562]
We introduce NormTab, a framework aimed at enhancing the symbolic reasoning performance of Large Language Models (LLMs) by normalizing web tables.<n>We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data.<n>Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance.
arXiv Detail & Related papers (2024-06-25T22:40:03Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - ReasTAP: Injecting Table Reasoning Skills During Pre-training via
Synthetic Reasoning Examples [15.212332890570869]
We develop ReasTAP to show that high-level table reasoning skills can be injected into models during pre-training without a complex table-specific architecture design.
ReasTAP achieves new state-of-the-art performance on all benchmarks and delivers a significant improvement on low-resource setting.
arXiv Detail & Related papers (2022-10-22T07:04:02Z) - FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining [23.747119682226675]
FORTAP is the first method for numerical-reasoning-aware table pretraining by leveraging large corpus of spreadsheet formulae.
FORTAP achieves results on two representative downstream tasks, cell type classification and formula prediction.
arXiv Detail & Related papers (2021-09-15T14:31:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.