MARO: Learning Stronger Reasoning from Social Interaction
- URL: http://arxiv.org/abs/2601.12323v2
- Date: Sat, 24 Jan 2026 14:16:46 GMT
- Title: MARO: Learning Stronger Reasoning from Social Interaction
- Authors: Yin Cai, Zhouhong Gu, Juntao Zhang, Ping Chen,
- Abstract summary: Multi-Agent Reward Optimization (MARO) is a method that enables large language models to acquire stronger reasoning abilities.<n> Experimental results demonstrate that MARO achieves significant improvements in social reasoning capabilities.
- Score: 7.77506109184819
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans face countless scenarios that require reasoning and judgment in daily life. However, existing large language model training methods primarily allow models to learn from existing textual content or solve predetermined problems, lacking experience in real scenarios involving interaction, negotiation, and competition with others. To address this, this paper proposes Multi-Agent Reward Optimization (MARO), a method that enables large language models (LLMs) to acquire stronger reasoning abilities by learning and practicing in multi-agent social environments. Specifically, MARO first addresses the sparse learning signal problem by decomposing final success or failure outcomes into each specific behavior during the interaction process; second, it handles the uneven role distribution problem by balancing the training sample weights of different roles; finally, it addresses environmental instability issues by directly evaluating the utility of each behavior. Experimental results demonstrate that MARO not only achieves significant improvements in social reasoning capabilities, but also that the abilities acquired through social simulation learning can effectively transfer to other tasks such as mathematical reasoning and instruction following. This reveals the tremendous potential of multi-agent social learning in enhancing the general reasoning capabilities of LLMs.
Related papers
- One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence [25.89075578734277]
This paper introduces OMAR: One Model, All Roles, a reinforcement learning framework for AI.<n>OMAR allows a single model to role-play all participants in a conversation simultaneously, learning to achieve long-term goals and complex social norms.<n>We show that trained models develop fine-grained, emergent social intelligence, such as empathy, persuasion, and compromise seeking.
arXiv Detail & Related papers (2026-02-03T05:09:49Z) - How Far Can LLMs Emulate Human Behavior?: A Strategic Analysis via the Buy-and-Sell Negotiation Game [0.8353024005684598]
This work proposes a methodology to quantitatively evaluate the human emotional and behavioral imitation and strategic decision-making capabilities of Large Language Models (LLMs)<n>Specifically, we assign different personas to multiple LLMs and conduct negotiations between a Buyer and a Seller, comprehensively analyzing outcomes such as win rates, transaction prices, and SHAP values.<n>Our experimental results show that models with higher existing benchmark scores tend to achieve better negotiation performance overall.
arXiv Detail & Related papers (2025-11-22T09:07:29Z) - Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making [0.030586855806896043]
Large language models (LLMs) are increasingly used in social science simulations.<n>We propose a process-oriented evaluation framework to examine how LLM agents adapt under different levels of external guidance and human-derived noise.<n>We find that LLMs, by default, converge on stable and conservative strategies that diverge from observed human behaviors.
arXiv Detail & Related papers (2025-08-21T18:55:53Z) - SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users [70.02370111025617]
We introduce SocioVerse, an agent-driven world model for social simulation.<n>Our framework features four powerful alignment components and a user pool of 10 million real individuals.<n>Results demonstrate that SocioVerse can reflect large-scale population dynamics while ensuring diversity, credibility, and representativeness.
arXiv Detail & Related papers (2025-04-14T12:12:52Z) - An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)<n>This paper explores potential areas where statisticians can make important contributions to the development of LLMs.<n>We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z) - PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning [0.0]
Large Language Models (LLMs) have demonstrated their capabilities across various tasks.
This paper exploits the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks.
We compare the performance of LLMs with a cognitive instance-based learning model, which imitates human experiential decision-making.
arXiv Detail & Related papers (2024-07-12T14:13:06Z) - Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs)
MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task.
We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z) - MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset [50.36095192314595]
Large Language Models (LLMs) function as conscious agents with generalizable reasoning capabilities.<n>This ability remains underexplored due to the complexity of modeling infinite possible changes in an event.<n>We introduce the first-ever benchmark, MARS, comprising three tasks corresponding to each step.
arXiv Detail & Related papers (2024-06-04T08:35:04Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.