Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
- URL: http://arxiv.org/abs/2502.12436v1
- Date: Tue, 18 Feb 2025 02:11:41 GMT
- Title: Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
- Authors: Wichayaporn Wongkamjan, Yanze Wang, Feng Gu, Denis Peskoff, Jonathan K. Kummerfeld, Jonathan May, Jordan Lee Boyd-Graber,
- Abstract summary: We analyze how humans strategically deceive each other in textitDiplomacy, a board game that requires both natural language communication and strategic reasoning.
Our method detects human deception with a high precision when compared to a Large Language Model approach.
Future human-abrai interaction tools can build on our methods for deception detection by triggering textitfriction to give users a chance of interrogating suspicious proposals.
- Score: 30.6942857922867
- License:
- Abstract: An increasingly prevalent socio-technical problem is people being taken in by offers that sound ``too good to be true'', where persuasion and trust shape decision-making. This paper investigates how \abr{ai} can help detect these deceptive scenarios. We analyze how humans strategically deceive each other in \textit{Diplomacy}, a board game that requires both natural language communication and strategic reasoning. This requires extracting logical forms of proposed agreements in player communications and computing the relative rewards of the proposal using agents' value functions. Combined with text-based features, this can improve our deception detection. Our method detects human deception with a high precision when compared to a Large Language Model approach that flags many true messages as deceptive. Future human-\abr{ai} interaction tools can build on our methods for deception detection by triggering \textit{friction} to give users a chance of interrogating suspicious proposals.
Related papers
- ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.
We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.
We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z) - Verbalized Bayesian Persuasion [54.55974023595722]
Information design (ID) explores how a sender influence the optimal behavior of receivers to achieve specific objectives.
This work proposes a verbalized framework in Bayesian persuasion (BP), which extends classic BP to real-world games involving human dialogues for the first time.
Numerical experiments in dialogue scenarios, such as recommendation letters, courtroom interactions, and law enforcement, validate that our framework can both reproduce theoretical results in classic BP and discover effective persuasion strategies.
arXiv Detail & Related papers (2025-02-03T18:20:10Z) - Peering Behind the Shield: Guardrail Identification in Large Language Models [22.78318541483925]
In this work, we propose a novel method, AP-Test, which identifies the presence of a candidate guardrail by leveraging guardrail-specific adversarial prompts to query the AI agent.
Extensive experiments of four candidate guardrails under diverse scenarios showcase the effectiveness of our method.
arXiv Detail & Related papers (2025-02-03T11:02:30Z) - 'Quis custodiet ipsos custodes?' Who will watch the watchmen? On Detecting AI-generated peer-reviews [20.030884734361358]
There is a growing concern that AI-generated texts could compromise scientific publishing, including peer-review.
We introduce the Term Frequency (TF) model, which posits that AI often repeats tokens, and the Review Regeneration (RR) model, which is based on the idea that ChatGPT generates similar outputs upon re-prompting.
Our findings suggest both our proposed methods perform better than the other AI text detectors.
arXiv Detail & Related papers (2024-10-13T08:06:08Z) - Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication [32.655335061150566]
We introduce a shared-control game, where two players collectively control a token in alternating turns to achieve a common objective under incomplete information.
We formulate a policy synthesis problem for an autonomous agent in this game with a human as the other player.
We propose a communication-based approach comprising a language module and a planning module.
arXiv Detail & Related papers (2024-05-23T04:58:42Z) - Deal, or no deal (or who knows)? Forecasting Uncertainty in
Conversations using Large Language Models [45.41542983671774]
How well can language models represent inherent uncertainty in conversations?
We propose FortUne Dial, an expansion of the long-standing "conversation forecasting" task.
We study two ways in which language models potentially represent outcome uncertainty.
arXiv Detail & Related papers (2024-02-05T18:39:47Z) - User Strategization and Trustworthy Algorithms [81.82279667028423]
We show that user strategization can actually help platforms in the short term.
We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions.
arXiv Detail & Related papers (2023-12-29T16:09:42Z) - AutoReply: Detecting Nonsense in Dialogue Introspectively with
Discriminative Replies [71.62832112141913]
We show that dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages.
We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy.
We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models.
arXiv Detail & Related papers (2022-11-22T22:31:34Z) - Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and
Symbolic Logic Rules [38.15523098189754]
We propose a zero-shot commonsense reasoning system for conversational agents.
Our reasoner uncovers unstated presumptions satisfying a general template of if-(state), then-(action), because-(goal)
We evaluate the model with a user study with human users that achieves a 35% higher success rate compared to SOTA.
arXiv Detail & Related papers (2021-09-17T13:40:07Z) - Few-shot Language Coordination by Modeling Theory of Mind [95.54446989205117]
We study the task of few-shot $textitlanguage coordination$.
We require the lead agent to coordinate with a $textitpopulation$ of agents with different linguistic abilities.
This requires the ability to model the partner's beliefs, a vital component of human communication.
arXiv Detail & Related papers (2021-07-12T19:26:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.