A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies
- URL: http://arxiv.org/abs/2409.18335v1
- Date: Thu, 26 Sep 2024 23:16:47 GMT
- Title: A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies
- Authors: Ryan Shea, Zhou Yu,
- Abstract summary: We propose a negotiation framework that incorporates fairness into the reward design and search to learn human-compatible negotiation strategies.
Our results show that our method is able to achieve more egalitarian negotiation outcomes and improve negotiation quality.
- Score: 19.595627721072812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent advancements in AI and NLP, negotiation remains a difficult domain for AI agents. Traditional game theoretic approaches that have worked well for two-player zero-sum games struggle in the context of negotiation due to their inability to learn human-compatible strategies. On the other hand, approaches that only use human data tend to be domain-specific and lack the theoretical guarantees provided by strategies grounded in game theory. Motivated by the notion of fairness as a criterion for optimality in general sum games, we propose a negotiation framework called FDHC which incorporates fairness into both the reward design and search to learn human-compatible negotiation strategies. Our method includes a novel, RL+search technique called LGM-Zero which leverages a pre-trained language model to retrieve human-compatible offers from large action spaces. Our results show that our method is able to achieve more egalitarian negotiation outcomes and improve negotiation quality.
Related papers
- LLMs with Personalities in Multi-issue Negotiation Games [2.186901738997927]
We measure the ability of large language models (LLMs) to negotiate within a game-theoretical framework.
We find high openness, conscientiousness, and neuroticism are associated with fair tendencies.
Low agreeableness and low openness are associated with rational tendencies.
arXiv Detail & Related papers (2024-05-08T17:51:53Z) - Policy Learning with a Language Bottleneck [65.99843627646018]
Policy Learning with a Language Bottleneck (PLLBB) is a framework enabling AI agents to generate linguistic rules.
PLLBB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules.
In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show thatPLLBB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z) - A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - Aligning Language Models with Human Preferences via a Bayesian Approach [11.984246334043673]
In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial.
This paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model.
Our method consistently exceeds previous SOTA models in both automatic and human evaluations.
arXiv Detail & Related papers (2023-10-09T15:15:05Z) - Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination [36.33334853998621]
We introduce the Cooperative Open-ended LEarning (COLE) framework to solve cooperative incompatibility in learning.
COLE formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy.
We show that COLE could effectively overcome the cooperative incompatibility from theoretical and empirical analysis.
arXiv Detail & Related papers (2023-06-05T16:51:38Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - A Data-Driven Method for Recognizing Automated Negotiation Strategies [13.606307819976161]
We propose a novel data-driven approach for recognizing an opponent's s negotiation strategy.
Our approach includes a data generation method for an agent to generate domain-independent sequences.
We perform extensive experiments, spanning four problem scenarios, to demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-07-03T20:43:47Z) - An Autonomous Negotiating Agent Framework with Reinforcement Learning
Based Strategies and Adaptive Strategy Switching Mechanism [3.4376560669160394]
This work focuses on solving the problem of expert selection and adapting to the opponent's behaviour with our Autonomous Negotiating Agent Framework.
Our framework has a reviewer component which enables self-enhancement capability by deciding to include new strategies or replace old ones with better strategies periodically.
arXiv Detail & Related papers (2021-02-06T14:38:03Z) - Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy.
We propose to decompose the adversarial training into two steps.
First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z) - Emergence of Pragmatics from Referential Game between Theory of Mind
Agents [64.25696237463397]
We propose an algorithm, using which agents can spontaneously learn the ability to "read between lines" without any explicit hand-designed rules.
We integrate the theory of mind (ToM) in a cooperative multi-agent pedagogical situation and propose an adaptive reinforcement learning (RL) algorithm to develop a communication protocol.
arXiv Detail & Related papers (2020-01-21T19:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.