Related papers: Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

URL: http://arxiv.org/abs/2305.10142v1
Date: Wed, 17 May 2023 11:55:32 GMT
Title: Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
Authors: Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata
Abstract summary: We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. Only a subset of the language models we consider can self-play and improve the deal price from AI feedback.
Score: 97.54519989641388
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We are interested in this question because if LLMs were able to improve each other, it would imply the possibility of creating strong AI agents with minimal human intervention. We ask two LLMs to negotiate with each other, playing the roles of a buyer and a seller, respectively. They aim to reach a deal with the buyer targeting a lower price and the seller a higher one. A third language model, playing the critic, provides feedback to a player to improve the player's negotiation strategies. We let the two agents play multiple rounds, using previous negotiation history and AI feedback as in-context demonstrations to improve the model's negotiation strategy iteratively. We use different LLMs (GPT and Claude) for different roles and use the deal price as the evaluation metric. Our experiments reveal multiple intriguing findings: (1) Only a subset of the language models we consider can self-play and improve the deal price from AI feedback, weaker models either do not understand the game's rules or cannot incorporate AI feedback for further improvement. (2) Models' abilities to learn from the feedback differ when playing different roles. For example, it is harder for Claude-instant to improve as the buyer than as the seller. (3) When unrolling the game to multiple rounds, stronger agents can consistently improve their performance by meaningfully using previous experiences and iterative AI feedback, yet have a higher risk of breaking the deal. We hope our work provides insightful initial explorations of having models autonomously improve each other with game playing and AI feedback.

Related papers

Playpen: An Environment for Exploring Learning Through Conversational Interaction [81.67330926729015]
We investigate whether Dialogue Games can also serve as a source of feedback signals for learning.<n>We introduce Playpen, an environment for off- and online learning through Dialogue Game self-play.<n>We find that imitation learning through SFT improves performance on unseen instances, but negatively impacts other skills.
arXiv Detail & Related papers (2025-04-11T14:49:33Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
Indirect Dynamic Negotiation in the Nash Demand Game [0.0]
We proposed a decision model that helps agents to successfully bargain by performing indirect negotiation and learning the opponent's model. We illustrate our approach by applying our model to the Nash demand game, which is an abstract model of bargaining.
arXiv Detail & Related papers (2024-09-10T14:58:00Z)
LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback [33.14770105185958]
Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players.
arXiv Detail & Related papers (2024-08-25T18:47:55Z)
Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method [17.388837360641276]
This paper describes the Bargaining task as an asymmetric incomplete information game. It allows us to quantitatively assess an agent's performance in the Bargain task. We propose a novel approach called OG-Narrator that integrates a deterministic Offer Generator to control the price range of Buyer's offers.
arXiv Detail & Related papers (2024-02-24T13:36:58Z)
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis [50.15061156253347]
Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents.
arXiv Detail & Related papers (2024-02-08T17:51:48Z)
Evaluating Language Model Agency through Negotiations [39.87262815823634]
Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage. We use our approach to test six widely used and publicly accessible LMs, evaluating performance and alignment in both self-play and cross-play settings.
arXiv Detail & Related papers (2024-01-09T13:19:37Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games [14.063311955315077]
Large language models (LLMs) are effective at answering questions that are clearly asked. When faced with ambiguous queries they can act unpredictably and produce incorrect outputs. This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively.
arXiv Detail & Related papers (2023-10-02T16:55:37Z)
All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z)
Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation [52.930183136111864]
We propose using scorable negotiation to evaluate Large Language Models (LLMs) To reach an agreement, agents must have strong arithmetic, inference, exploration, and planning capabilities. We provide procedures to create new games and increase games' difficulty to have an evolving benchmark.
arXiv Detail & Related papers (2023-09-29T13:33:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.