Negotiated Reasoning: On Provably Addressing Relative
Over-Generalization
- URL: http://arxiv.org/abs/2306.05353v1
- Date: Thu, 8 Jun 2023 16:57:12 GMT
- Title: Negotiated Reasoning: On Provably Addressing Relative
Over-Generalization
- Authors: Junjie Sheng, Wenhao Li, Bo Jin, Hongyuan Zha, Jun Wang, Xiangfeng
Wang
- Abstract summary: Over-generalization is a thorny issue in cognitive science, where people may become overly cautious due to past experiences.
Agents in multi-agent reinforcement learning (MARL) also have been found to suffer relative over-generalization (RO) as people do and stuck to sub-optimal cooperation.
Recent methods have shown that assigning reasoning ability to agents can mitigate RO algorithmically and empirically, but there has been a lack of theoretical understanding of RO.
- Score: 49.5896371203566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over-generalization is a thorny issue in cognitive science, where people may
become overly cautious due to past experiences. Agents in multi-agent
reinforcement learning (MARL) also have been found to suffer relative
over-generalization (RO) as people do and stuck to sub-optimal cooperation.
Recent methods have shown that assigning reasoning ability to agents can
mitigate RO algorithmically and empirically, but there has been a lack of
theoretical understanding of RO, let alone designing provably RO-free methods.
This paper first proves that RO can be avoided when the MARL method satisfies a
consistent reasoning requirement under certain conditions. Then we introduce a
novel reasoning framework, called negotiated reasoning, that first builds the
connection between reasoning and RO with theoretical justifications. After
that, we propose an instantiated algorithm, Stein variational negotiated
reasoning (SVNR), which uses Stein variational gradient descent to derive a
negotiation policy that provably avoids RO in MARL under maximum entropy policy
iteration. The method is further parameterized with neural networks for
amortized learning, making computation efficient. Numerical experiments on many
RO-challenged environments demonstrate the superiority and efficiency of SVNR
compared to state-of-the-art methods in addressing RO.
Related papers
- Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - Large Language Models as an Indirect Reasoner: Contrapositive and
Contradiction for Automated Reasoning [79.37150041259066]
This paper proposes a novel Indirect Reasoning (IR) method that employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof.
The experimental results on popular LLMs, such as GPT-3.5-turbo and Gemini-pro, show that our IR method enhances the overall accuracy of factual reasoning by 27.33% and mathematical proof by 31.43%.
arXiv Detail & Related papers (2024-02-06T03:41:12Z) - LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs)
This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z) - DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy [76.58614128865652]
We propose DetermLR, a novel perspective that rethinks the reasoning process as an evolution from indeterminacy to determinacy.
First, we categorize known conditions into two types: determinate and indeterminate premises This provides an oveall direction for the reasoning process and guides LLMs in converting indeterminate data into progressively determinate insights.
We automate the storage and extraction of available premises and reasoning paths with reasoning memory, preserving historical reasoning details for subsequent reasoning steps.
arXiv Detail & Related papers (2023-10-28T10:05:51Z) - On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality
Perspective [5.8010446129208155]
This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure, which is prevalently employed for gauging the performance of feature importance estimates.
The insights gleaned from our theoretical foundation and empirical investigations reveal that attributions containing lesser information about the decision function may yield superior results in ROAR benchmarks.
arXiv Detail & Related papers (2023-04-26T21:43:42Z) - Occupancy Information Ratio: Infinite-Horizon, Information-Directed,
Parameterized Policy Search [21.850348833971722]
We propose an information-directed objective for infinite-horizon reinforcement learning (RL) called the occupancy information ratio (OIR)
The OIR enjoys rich underlying structure and presents an objective to which scalable, model-free policy search methods naturally apply.
We show by leveraging connections between quasiconcave optimization and the linear programming theory for Markov decision processes that the OIR problem can be transformed and solved via concave programming methods when the underlying model is known.
arXiv Detail & Related papers (2022-01-21T18:40:03Z) - Cross-sentence Neural Language Models for Conversational Speech
Recognition [17.317583079824423]
We propose an effective cross-sentence neural LM approach that reranks the ASR N-best hypotheses of an upcoming sentence.
We also explore to extract task-specific global topical information of the cross-sentence history.
arXiv Detail & Related papers (2021-06-13T05:30:16Z) - Pairwise Relations Discriminator for Unsupervised Raven's Progressive
Matrices [7.769102711230249]
We introduce a pairwise relations discriminator (PRD) to develop unsupervised models with sufficient reasoning abilities to tackle an Raven's Progressive Matrices problem.
PRD reframes the RPM problem into a relation comparison task, which we can solve without requiring the labelling of the RPM problem.
Our approach, the PRD, establishes a new state-of-the-art unsupervised learning benchmark with an accuracy of 55.9% on the I-RAVEN.
arXiv Detail & Related papers (2020-11-02T20:49:46Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.