Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
- URL: http://arxiv.org/abs/2502.12436v2
- Date: Fri, 21 Feb 2025 19:52:58 GMT
- Title: Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
- Authors: Wichayaporn Wongkamjan, Yanze Wang, Feng Gu, Denis Peskoff, Jonathan K. Kummerfeld, Jonathan May, Jordan Lee Boyd-Graber,
- Abstract summary: We analyze how humans strategically deceive each other in textitDiplomacy, a board game that requires both natural language communication and strategic reasoning.<n>Our method detects human deception with a high precision when compared to a Large Language Model approach.<n>Future human-abrai interaction tools can build on our methods for deception detection by triggering textitfriction to give users a chance of interrogating suspicious proposals.
- Score: 30.6942857922867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An increasingly prevalent socio-technical problem is people being taken in by offers that sound ``too good to be true'', where persuasion and trust shape decision-making. This paper investigates how \abr{ai} can help detect these deceptive scenarios. We analyze how humans strategically deceive each other in \textit{Diplomacy}, a board game that requires both natural language communication and strategic reasoning. This requires extracting logical forms of proposed agreements in player communications and computing the relative rewards of the proposal using agents' value functions. Combined with text-based features, this can improve our deception detection. Our method detects human deception with a high precision when compared to a Large Language Model approach that flags many true messages as deceptive. Future human-\abr{ai} interaction tools can build on our methods for deception detection by triggering \textit{friction} to give users a chance of interrogating suspicious proposals.
Related papers
- Among Us: A Sandbox for Agentic Deception [1.1893676124374688]
Among Us is a text-based social-deduction game environment.
LLM-agents exhibit human-style deception naturally while they think, speak, and act with other agents or humans.
We evaluate the effectiveness of AI safety techniques for detecting lying and deception in Among Us.
arXiv Detail & Related papers (2025-04-05T06:09:32Z) - ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.<n>We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.<n>We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z) - Verbalized Bayesian Persuasion [54.55974023595722]
Information design (ID) explores how a sender influence the optimal behavior of receivers to achieve specific objectives.<n>This work proposes a verbalized framework in Bayesian persuasion (BP), which extends classic BP to real-world games involving human dialogues for the first time.<n> Numerical experiments in dialogue scenarios, such as recommendation letters, courtroom interactions, and law enforcement, validate that our framework can both reproduce theoretical results in classic BP and discover effective persuasion strategies.
arXiv Detail & Related papers (2025-02-03T18:20:10Z) - Peering Behind the Shield: Guardrail Identification in Large Language Models [22.78318541483925]
In this work, we propose a novel method, AP-Test, which identifies the presence of a candidate guardrail by leveraging guardrail-specific adversarial prompts to query the AI agent.<n>Extensive experiments of four candidate guardrails under diverse scenarios showcase the effectiveness of our method.
arXiv Detail & Related papers (2025-02-03T11:02:30Z) - Improving Cooperation in Language Games with Bayesian Inference and the Cognitive Hierarchy [0.8149787238021642]
In language games, failure may be due to disagreement in the understanding of either the semantics or pragmatics of an utterance.<n>We model coarse uncertainty in semantics using a prior distribution of language models and uncertainty in pragmatics using the cognitive hierarchy.<n>To handle all forms of uncertainty we construct agents that learn the behavior of their partner using Bayesian inference.
arXiv Detail & Related papers (2024-12-16T23:24:12Z) - Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - 'Quis custodiet ipsos custodes?' Who will watch the watchmen? On Detecting AI-generated peer-reviews [20.030884734361358]
There is a growing concern that AI-generated texts could compromise scientific publishing, including peer-review.
We introduce the Term Frequency (TF) model, which posits that AI often repeats tokens, and the Review Regeneration (RR) model, which is based on the idea that ChatGPT generates similar outputs upon re-prompting.
Our findings suggest both our proposed methods perform better than the other AI text detectors.
arXiv Detail & Related papers (2024-10-13T08:06:08Z) - Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication [32.655335061150566]
We introduce a shared-control game, where two players collectively control a token in alternating turns to achieve a common objective under incomplete information.
We formulate a policy synthesis problem for an autonomous agent in this game with a human as the other player.
We propose a communication-based approach comprising a language module and a planning module.
arXiv Detail & Related papers (2024-05-23T04:58:42Z) - Deal, or no deal (or who knows)? Forecasting Uncertainty in
Conversations using Large Language Models [45.41542983671774]
How well can language models represent inherent uncertainty in conversations?
We propose FortUne Dial, an expansion of the long-standing "conversation forecasting" task.
We study two ways in which language models potentially represent outcome uncertainty.
arXiv Detail & Related papers (2024-02-05T18:39:47Z) - User Strategization and Trustworthy Algorithms [81.82279667028423]
We show that user strategization can actually help platforms in the short term.
We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions.
arXiv Detail & Related papers (2023-12-29T16:09:42Z) - AutoReply: Detecting Nonsense in Dialogue Introspectively with
Discriminative Replies [71.62832112141913]
We show that dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages.
We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy.
We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models.
arXiv Detail & Related papers (2022-11-22T22:31:34Z) - Learning from data in the mixed adversarial non-adversarial case:
Finding the helpers and ignoring the trolls [28.903534969338015]
We study how to perform robust learning in such an environment.
We introduce a benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs. toxic language.
We propose and analyze several mitigating learning algorithms that identify trolls either at the example or at the user level.
arXiv Detail & Related papers (2022-08-05T17:33:33Z) - Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and
Symbolic Logic Rules [38.15523098189754]
We propose a zero-shot commonsense reasoning system for conversational agents.
Our reasoner uncovers unstated presumptions satisfying a general template of if-(state), then-(action), because-(goal)
We evaluate the model with a user study with human users that achieves a 35% higher success rate compared to SOTA.
arXiv Detail & Related papers (2021-09-17T13:40:07Z) - Few-shot Language Coordination by Modeling Theory of Mind [95.54446989205117]
We study the task of few-shot $textitlanguage coordination$.
We require the lead agent to coordinate with a $textitpopulation$ of agents with different linguistic abilities.
This requires the ability to model the partner's beliefs, a vital component of human communication.
arXiv Detail & Related papers (2021-07-12T19:26:11Z) - Machine Learning Explanations to Prevent Overtrust in Fake News
Detection [64.46876057393703]
This research investigates the effects of an Explainable AI assistant embedded in news review platforms for combating the propagation of fake news.
We design a news reviewing and sharing interface, create a dataset of news stories, and train four interpretable fake news detection algorithms.
For a deeper understanding of Explainable AI systems, we discuss interactions between user engagement, mental model, trust, and performance measures in the process of explaining.
arXiv Detail & Related papers (2020-07-24T05:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.