Negotiated Reasoning: On Provably Addressing Relative
Over-Generalization
- URL: http://arxiv.org/abs/2306.05353v1
- Date: Thu, 8 Jun 2023 16:57:12 GMT
- Title: Negotiated Reasoning: On Provably Addressing Relative
Over-Generalization
- Authors: Junjie Sheng, Wenhao Li, Bo Jin, Hongyuan Zha, Jun Wang, Xiangfeng
Wang
- Abstract summary: Over-generalization is a thorny issue in cognitive science, where people may become overly cautious due to past experiences.
Agents in multi-agent reinforcement learning (MARL) also have been found to suffer relative over-generalization (RO) as people do and stuck to sub-optimal cooperation.
Recent methods have shown that assigning reasoning ability to agents can mitigate RO algorithmically and empirically, but there has been a lack of theoretical understanding of RO.
- Score: 49.5896371203566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over-generalization is a thorny issue in cognitive science, where people may
become overly cautious due to past experiences. Agents in multi-agent
reinforcement learning (MARL) also have been found to suffer relative
over-generalization (RO) as people do and stuck to sub-optimal cooperation.
Recent methods have shown that assigning reasoning ability to agents can
mitigate RO algorithmically and empirically, but there has been a lack of
theoretical understanding of RO, let alone designing provably RO-free methods.
This paper first proves that RO can be avoided when the MARL method satisfies a
consistent reasoning requirement under certain conditions. Then we introduce a
novel reasoning framework, called negotiated reasoning, that first builds the
connection between reasoning and RO with theoretical justifications. After
that, we propose an instantiated algorithm, Stein variational negotiated
reasoning (SVNR), which uses Stein variational gradient descent to derive a
negotiation policy that provably avoids RO in MARL under maximum entropy policy
iteration. The method is further parameterized with neural networks for
amortized learning, making computation efficient. Numerical experiments on many
RO-challenged environments demonstrate the superiority and efficiency of SVNR
compared to state-of-the-art methods in addressing RO.
Related papers
- BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding [25.329712997545794]
We propose Retrieval-Augmented Reasoning through Trustworthy Process Rewarding (ReARTeR)
ReARTeR enhances RAG systems' reasoning capabilities through post-training and test-time scaling.
Experimental results on multi-step reasoning benchmarks demonstrate significant improvements.
arXiv Detail & Related papers (2025-01-14T05:56:26Z) - Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer.
Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z) - LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs)
This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z) - On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality
Perspective [5.8010446129208155]
This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure, which is prevalently employed for gauging the performance of feature importance estimates.
The insights gleaned from our theoretical foundation and empirical investigations reveal that attributions containing lesser information about the decision function may yield superior results in ROAR benchmarks.
arXiv Detail & Related papers (2023-04-26T21:43:42Z) - Cross-sentence Neural Language Models for Conversational Speech
Recognition [17.317583079824423]
We propose an effective cross-sentence neural LM approach that reranks the ASR N-best hypotheses of an upcoming sentence.
We also explore to extract task-specific global topical information of the cross-sentence history.
arXiv Detail & Related papers (2021-06-13T05:30:16Z) - Pairwise Relations Discriminator for Unsupervised Raven's Progressive
Matrices [7.769102711230249]
We introduce a pairwise relations discriminator (PRD) to develop unsupervised models with sufficient reasoning abilities to tackle an Raven's Progressive Matrices problem.
PRD reframes the RPM problem into a relation comparison task, which we can solve without requiring the labelling of the RPM problem.
Our approach, the PRD, establishes a new state-of-the-art unsupervised learning benchmark with an accuracy of 55.9% on the I-RAVEN.
arXiv Detail & Related papers (2020-11-02T20:49:46Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.