Related papers: Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

URL: http://arxiv.org/abs/2506.00577v1
Date: Sat, 31 May 2025 14:22:40 GMT
Title: Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Authors: Yufa Zhou, Shaobo Wang, Xingyu Dong, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang,
Abstract summary: This paper explores whether post-training techniques, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), can effectively $textitgeneralize$ to multi-agent scenarios.<n>We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory.<n> Comprehensive evaluation on economic reasoning benchmarks and multi-agent games reveals clear improvements in structured reasoning and economic rationality.
Score: 25.067282214293904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Directly training Large Language Models (LLMs) for Multi-Agent Systems (MAS) remains challenging due to intricate reward modeling, dynamic agent interactions, and demanding generalization requirements. This paper explores whether post-training techniques, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), can effectively $\textit{generalize}$ to multi-agent scenarios. We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory, its demand for structured analytical reasoning, and its relevance to real-world applications such as market design, resource allocation, and policy analysis. We introduce $\textbf{Recon}$ ($\textbf{R}$easoning like an $\textbf{ECON}$omist), a 7B-parameter open-source LLM post-trained on a hand-curated dataset of 2,100 high-quality economic reasoning problems. Comprehensive evaluation on economic reasoning benchmarks and multi-agent games reveals clear improvements in structured reasoning and economic rationality. These results underscore the promise of domain-aligned post-training for enhancing reasoning and agent alignment, shedding light on the roles of SFT and RL in shaping model behavior. Code is available at https://github.com/MasterZhou1/Recon .

Related papers

From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models [1.8953148404648696]
The application of Reinforcement Learning to economic modeling reveals a fundamental conflict between the assumptions of equilibrium theory and the emergent behavior of learning agents.<n>This paper first demonstrates this discrepancy within a search-and-matching model with concave production, showing that a standard RL agent learns a non-equilibrium, monopsonistic policy.<n>We propose a calibrated Mean-Field Reinforcement Learning framework that embeds a representative agent in a fixed macroeconomic field and adjusts the cost function to reflect economic opportunity costs.
arXiv Detail & Related papers (2025-07-24T09:21:02Z)
From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium [52.28048367430481]
Multi-agent frameworks can boost the reasoning power of large language models (LLMs), but they typically incur heavy computational costs and lack convergence guarantees.<n>We recast multi-LLM coordination as an incomplete-information game and seek a Bayesian Nash equilibrium (BNE)<n>We introduce Efficient Coordination via Nash Equilibrium (ECON), a hierarchical reinforcement-learning paradigm that marries distributed reasoning with centralized final output.
arXiv Detail & Related papers (2025-06-09T23:49:14Z)
Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z)
General-Reasoner: Advancing LLM Reasoning Across All Domains [64.70599911897595]
Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose General-Reasoner, a novel training paradigm designed to enhance LLM reasoning capabilities across diverse domains.<n>We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc.
arXiv Detail & Related papers (2025-05-20T17:41:33Z)
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding.<n>It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions.<n>Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT)<n>Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Approximating Human Strategic Reasoning with LLM-Enhanced Recursive Reasoners Leveraging Multi-agent Hypergames [3.5083201638203154]
We implement a role-based multi-agent strategic interaction framework tailored to sophisticated reasoners.<n>We use one-shot, 2-player beauty contests to evaluate the reasoning capabilities of the latest LLMs.<n>Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.
arXiv Detail & Related papers (2025-02-11T10:37:20Z)
SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent [45.41401816514924]
We propose an innovative framework, SRAP-Agent, which integrates Large Language Models (LLMs) into economic simulations. We conduct extensive policy simulation experiments to verify the feasibility and effectiveness of the SRAP-Agent.
arXiv Detail & Related papers (2024-10-18T03:43:42Z)
Simulating Financial Market via Large Language Model based Agents [22.36549613587476]
Most economic theories assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. We propose textbfAgent-based textbfSimulated textbfFinancial textbfMarket (ASFM), which first constructs a simulated stock market with a real order matching system.
arXiv Detail & Related papers (2024-06-28T14:54:12Z)
Simulating the Economic Impact of Rationality through Reinforcement Learning and Agent-Based Modelling [1.7546137756031712]
We leverage multi-agent reinforcement learning (RL) to expand the capabilities of agent-based models (ABMs) We show that RL agents spontaneously learn three distinct strategies for maximising profits, with the optimal strategy depending on the level of market competition and rationality. We also find that RL agents with independent policies, and without the ability to communicate with each other, spontaneously learn to segregate into different strategic groups, thus increasing market power and overall profits.
arXiv Detail & Related papers (2024-05-03T15:08:25Z)
Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning [9.039809980024852]
We propose a universal logic-guided deep reinforcement learning framework for Q-trading, called Logic-Q.<n>In particular, Logic-Q adopts the program synthesis by sketching paradigm and introduces a logic-guided model design that leverages a lightweight, plug-and-play market trend-aware program sketch to determine the market trend.<n>Extensive evaluations of two popular quantitative trading tasks demonstrate that Logic-Q can significantly improve the performance of previous state-of-the-art DRL trading strategies.
arXiv Detail & Related papers (2023-10-09T09:20:13Z)
Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning [72.23843557783533]
We show that deep reinforcement learning can discover stable solutions that are epsilon-Nash equilibria for a meta-game over agent types. Our approach is more flexible and does not need unrealistic assumptions, e.g., market clearing. We demonstrate our approach in real-business-cycle models, a representative family of DGE models, with 100 worker-consumers, 10 firms, and a government who taxes and redistributes.
arXiv Detail & Related papers (2022-01-03T17:00:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.