Related papers: Bayesian Social Deduction with Graph-Informed Language Models

Bayesian Social Deduction with Graph-Informed Language Models

URL: http://arxiv.org/abs/2506.17788v1
Date: Sat, 21 Jun 2025 18:45:28 GMT
Title: Bayesian Social Deduction with Graph-Informed Language Models
Authors: Shahab Rahimirad, Guven Gergerli, Lucia Romero, Angela Qian, Matthew Lyle Olson, Simon Stepputtis, Joseph Campbell,
Abstract summary: Social reasoning remains a challenging task for large language models.<n>We introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model.<n>Our approach achieves competitive performance with much larger models in Agent-Agent play.
Score: 3.7540464038118633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Social reasoning - inferring unobservable beliefs and intentions from partial observations of other agents - remains a challenging task for large language models (LLMs). We evaluate the limits of current reasoning language models in the social deduction game Avalon and find that while the largest models demonstrate strong performance, they require extensive test-time inference and degrade sharply when distilled to smaller, real-time-capable variants. To address this, we introduce a hybrid reasoning framework that externalizes belief inference to a structured probabilistic model, while using an LLM for language understanding and interaction. Our approach achieves competitive performance with much larger models in Agent-Agent play and, notably, is the first language agent to defeat human players in a controlled study - achieving a 67% win rate and receiving higher qualitative ratings than both reasoning baselines and human teammates. We release code, models, and a dataset to support future work on social reasoning in LLM agents, which can be found at https://camp-lab-purdue.github.io/bayesian-social-deduction/

Related papers

Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning [71.4175109189942]
We present Pivot-Aligned Self-Feedback Multilingual Reasoning (PASMR)<n>This approach designates the model's primary language as the pivot language.<n>It establishes a cross-lingual self-feedback mechanism without relying on external correct answers or reward models.
arXiv Detail & Related papers (2026-01-25T03:20:00Z)
Tandem Training for Language Models [27.21963074798485]
We develop methods that encourage models to produce solutions that remain intelligible to weaker collaborators.<n>We introduce tandem training for language models, in which rollout tokens are intermittently and randomly sampled from a frozen weak model.<n>In the GSM8K math reasoning task, tandem training reliably teaches models to abandon jargon and adapt their language to weaker partners.
arXiv Detail & Related papers (2025-10-15T13:48:16Z)
Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies [54.08697738311866]
Social deduction games like Werewolf combine language, reasoning, and strategy.<n>We curate a high-quality, human-verified multimodal Werewolf dataset containing over 100 hours of video, 32.4M utterance tokens, and 15 rule variants.<n>We propose a novel strategy-alignment evaluation that leverages the winning faction's strategies as ground truth in two stages.
arXiv Detail & Related papers (2025-10-13T13:33:30Z)
Evaluating Language Models' Evaluations of Games [65.49017696754825]
We advocate for a new paradigm that assesses AI systems' evaluation of games.<n>We leverage a large-scale dataset of over $100$ novel board games and over 450 human judgments.<n>Our results show that reasoning models are generally more aligned to people in their evaluations of games than non-reasoning language models.
arXiv Detail & Related papers (2025-10-13T02:45:37Z)
Among Them: A game-based framework for assessing persuasion capabilities of LLMs [0.8763629723457529]
Large language models (LLMs) and autonomous AI agents have raised concerns about their potential for automated persuasion and social influence.<n>We present an Among Us-inspired game framework for assessing LLM deception skills in a controlled environment.
arXiv Detail & Related papers (2025-02-27T12:26:21Z)
ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning.<n>We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics.<n>Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z)
LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback [33.14770105185958]
Large Language Models (LLMs) excel at generating human-like dialogues and comprehending text. We propose a bootstrapping framework that leverages self-generated feedback to enhance LLM reasoning capabilities for lie detection. We investigate the application of the proposed framework for detecting betrayal and deception in Diplomacy games, and compare it with feedback from professional human players.
arXiv Detail & Related papers (2024-08-25T18:47:55Z)
A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood. We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games [14.063311955315077]
Large language models (LLMs) are effective at answering questions that are clearly asked. When faced with ambiguous queries they can act unpredictably and produce incorrect outputs. This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively.
arXiv Detail & Related papers (2023-10-02T16:55:37Z)
A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds? [2.7342737448775534]
Large Language Models (LLMs) have been linked to claims about human-like linguistic performance. We analyze the contribution of LLMs as theoretically informative representations of a target cognitive system. We evaluate the models' ability to see the bigger picture, through top-down feedback from higher levels of processing.
arXiv Detail & Related papers (2023-07-26T18:58:53Z)
The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs [50.32802502923367]
We study the process of language driving and influencing social reasoning in a probabilistic goal inference domain. We propose a neuro-symbolic model that carries out goal inference from linguistic inputs of agent scenarios. Our model closely matches human response patterns and better predicts human judgements than using an LLM alone.
arXiv Detail & Related papers (2023-06-25T19:38:01Z)
Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model. By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
Multimodal Chain-of-Thought Reasoning in Language Models [94.70184390935661]
We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-02-02T07:51:19Z)
Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks [39.39138995087475]
We ask how much of human-like thinking can be captured by learning statistical patterns in language alone. Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization. We find that humans are far more robust than LLMs on this benchmark.
arXiv Detail & Related papers (2022-05-11T18:14:33Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.