Are LLMs Effective Negotiators? Systematic Evaluation of the
Multifaceted Capabilities of LLMs in Negotiation Dialogues
- URL: http://arxiv.org/abs/2402.13550v1
- Date: Wed, 21 Feb 2024 06:11:03 GMT
- Title: Are LLMs Effective Negotiators? Systematic Evaluation of the
Multifaceted Capabilities of LLMs in Negotiation Dialogues
- Authors: Deuksin Kwon, Emily Weiss, Tara Kulshrestha, Kushal Chawla, Gale M.
Lucas, Jonathan Gratch
- Abstract summary: LLMs can advance different aspects of negotiation research, ranging from designing dialogue systems to providing pedagogical feedback and scaling up data collection practices.
Our analysis adds to the increasing evidence for the superiority of GPT-4 across various tasks.
For instance, the models correlate poorly with human players when making subjective assessments about the negotiation dialogues.
- Score: 5.021504231639885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A successful negotiation demands a deep comprehension of the conversation
context, Theory-of-Mind (ToM) skills to infer the partner's motives, as well as
strategic reasoning and effective communication, making it challenging for
automated systems. Given the remarkable performance of LLMs across a variety of
NLP tasks, in this work, we aim to understand how LLMs can advance different
aspects of negotiation research, ranging from designing dialogue systems to
providing pedagogical feedback and scaling up data collection practices. To
this end, we devise a methodology to analyze the multifaceted capabilities of
LLMs across diverse dialogue scenarios covering all the time stages of a
typical negotiation interaction. Our analysis adds to the increasing evidence
for the superiority of GPT-4 across various tasks while also providing insights
into specific tasks that remain difficult for LLMs. For instance, the models
correlate poorly with human players when making subjective assessments about
the negotiation dialogues and often struggle to generate responses that are
contextually appropriate as well as strategically advantageous.
Related papers
- DivTOD: Unleashing the Power of LLMs for Diversifying Task-Oriented Dialogue Representations [21.814490079113323]
Language models pre-trained on general text have achieved impressive results in diverse fields.
Yet, the distinct linguistic characteristics of task-oriented dialogues (TOD) compared to general text limit the practical utility of existing language models.
We propose a novel dialogue pre-training model called DivTOD, which collaborates with LLMs to learn diverse task-oriented dialogue representations.
arXiv Detail & Related papers (2024-03-31T04:36:57Z) - MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues [58.33076950775072]
MT-Bench-101 is designed to evaluate the fine-grained abilities of Large Language Models (LLMs) in multi-turn dialogues.
We construct a three-tier hierarchical ability taxonomy comprising 4208 turns across 1388 multi-turn dialogues in 13 distinct tasks.
We then evaluate 21 popular LLMs based on MT-Bench-101, conducting comprehensive analyses from both ability and task perspectives.
arXiv Detail & Related papers (2024-02-22T18:21:59Z) - How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis [50.15061156253347]
Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources.
With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate.
We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents.
arXiv Detail & Related papers (2024-02-08T17:51:48Z) - Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models [51.75805497456226]
This work focuses on the factual consistency issue with the help of the dialogue summarization task.
Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency.
To stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data.
arXiv Detail & Related papers (2023-11-13T09:32:12Z) - Plug-and-Play Policy Planner for Large Language Model Powered Dialogue
Agents [121.46051697742608]
We introduce a new dialogue policy planning paradigm to strategize dialogue problems with a tunable language model plug-in named PPDPP.
Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data.
PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications.
arXiv Detail & Related papers (2023-11-01T03:20:16Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Prompting and Evaluating Large Language Models for Proactive Dialogues:
Clarification, Target-guided, and Non-collaboration [72.04629217161656]
This work focuses on three aspects of proactive dialogue systems: clarification, target-guided, and non-collaborative dialogues.
To trigger the proactivity of LLMs, we propose the Proactive Chain-of-Thought prompting scheme.
arXiv Detail & Related papers (2023-05-23T02:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.