CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference
- URL: http://arxiv.org/abs/2511.06175v1
- Date: Sun, 09 Nov 2025 01:20:18 GMT
- Title: CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference
- Authors: Kaijie Xu, Fandi Meng, Clark Verbrugge, Simon Lucas,
- Abstract summary: In Social Deduction Games (SDGs) such as Avalon, Mafia, and Werewolf, players conceal their identities and deliberately mislead others.<n>We introduce CSP4SDG, a probabilistic, constraint-satisfaction framework that analyses gameplay objectively.<n>Our study validates that principled probabilistic reasoning with information theory is a scalable alternative-or complement-to heavy-weight neural models for SDGs.
- Score: 3.1263038719799745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Social Deduction Games (SDGs) such as Avalon, Mafia, and Werewolf, players conceal their identities and deliberately mislead others, making hidden-role inference a central and demanding task. Accurate role identification, which forms the basis of an agent's belief state, is therefore the keystone for both human and AI performance. We introduce CSP4SDG, a probabilistic, constraint-satisfaction framework that analyses gameplay objectively. Game events and dialogue are mapped to four linguistically-agnostic constraint classes-evidence, phenomena, assertions, and hypotheses. Hard constraints prune impossible role assignments, while weighted soft constraints score the remainder; information-gain weighting links each hypothesis to its expected value under entropy reduction, and a simple closed-form scoring rule guarantees that truthful assertions converge to classical hard logic with minimum error. The resulting posterior over roles is fully interpretable and updates in real time. Experiments on three public datasets show that CSP4SDG (i) outperforms LLM-based baselines in every inference scenario, and (ii) boosts LLMs when supplied as an auxiliary "reasoning tool." Our study validates that principled probabilistic reasoning with information theory is a scalable alternative-or complement-to heavy-weight neural models for SDGs.
Related papers
- Wild Guesses and Mild Guesses in Active Concept Learning [0.0]
We study a trade-off in a neuro-symbolic Bayesian learner whose hypotheses are proposed by a large language model (LLM)<n>We compare a Rational Active Learner that selects queries to maximize approximate expected information gain (EIG) and the human-like Positive Test Strategy (PTS)<n>Our results suggest that "confirmation bias" may not be a cognitive error, but rather a rational adaptation for maintaining tractable inference in the sparse, open-ended hypothesis spaces characteristic of human thought.
arXiv Detail & Related papers (2026-02-06T16:04:44Z) - Two-Faced Social Agents: Context Collapse in Role-Conditioned Large Language Models [0.0]
GPT-5 exhibited complete mathematics contextual collapse and adopted a singular identity towards optimal responses.<n> Claude Sonnet 4.5 retained limited but measurable role-specific variation on the SAT items.<n>All models exhibited distinct role-conditioned affective preference, indicating that socio-affective variation can reemerge when cognitive constraints are relaxed.
arXiv Detail & Related papers (2025-11-19T16:04:49Z) - Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning [55.59724323303857]
We propose a framework that balances exploration and exploitation via three components: difficulty-aware coefficient allocation, initial-anchored target entropy, and dynamic global coefficient adjustment.<n>Experiments on multiple mathematical reasoning benchmarks show that AER consistently outperforms baselines, improving both reasoning accuracy and exploration capability.
arXiv Detail & Related papers (2025-10-13T03:10:26Z) - Evaluating LLM Alignment on Personality Inference from Real-World Interview Data [7.061237517845673]
Large Language Models (LLMs) are increasingly deployed in roles requiring nuanced psychological understanding.<n>Their ability to interpret human personality traits, a critical aspect of such applications, remains unexplored.<n>We introduce a novel benchmark comprising semi-structured interview transcripts paired with validated continuous Big Five trait scores.
arXiv Detail & Related papers (2025-09-16T16:54:35Z) - The Knowledge-Reasoning Dissociation: Fundamental Limitations of LLMs in Clinical Natural Language Inference [13.59675117792588]
Large language models are often assumed to acquire increasingly structured, generalizable internal representations simply by scaling data and parameters.<n>We interrogate this assumption by introducing a Clinical Trial Natural Language In Attribution benchmark comprising four reasoning families.<n>Each item is paired with a targeted Ground Knowledge and Meta-Level Reasoning Verification probe, allowing us to dissociate failures of factual access from failures of inference.
arXiv Detail & Related papers (2025-08-14T16:01:10Z) - Incentivizing Truthful Language Models via Peer Elicitation Games [13.262897952363147]
Large Language Models (LLMs) have demonstrated strong generative capabilities but remain prone to inconsistencies and hallucinations.<n>We introduce Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs through a peer elicitation mechanism involving a generator and multiple discriminators instantiated from distinct base models.
arXiv Detail & Related papers (2025-05-19T18:16:58Z) - Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.<n>We explicitly align models with three meta-abilities: deduction, induction, and abduction, using automatically generated, self-verifiable tasks.<n>Our three stage-pipeline individual alignment, parameter-space merging, and domain-specific reinforcement learning, boosts performance by over 10% relative to instruction-tuned baselines.
arXiv Detail & Related papers (2025-05-15T17:58:33Z) - CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models [15.560280546809457]
Chain-of-thought (CoT) reasoning boosts large language models' (LLMs) performance on complex tasks.<n>We propose CoT-RAG, a novel reasoning framework with three key designs.<n>We show significant accuracy gains-ranging from 4.0% to 44.3%-over state-of-the-art methods.
arXiv Detail & Related papers (2025-04-18T07:55:09Z) - Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models [77.96693360763925]
Video SimpleQA is the first comprehensive benchmark tailored for factuality evaluation in video contexts.<n>Our work differs from existing video benchmarks through the following key features: Knowledge required: demanding integration of external knowledge beyond the video's explicit narrative.<n>Short-form definitive answer: Answers are crafted as unambiguous and definitively correct in a short format with minimal scoring variance.
arXiv Detail & Related papers (2025-03-24T17:46:09Z) - Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.<n>Our algorithm is modeled after the Bayesian theory-of-mind framework.<n>We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z) - Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning [88.68573198200698]
We introduce ExploreToM, the first framework to allow large-scale generation of diverse and challenging theory of mind data.<n>Our approach leverages an A* search over a custom domain-specific language to produce complex story structures and novel, diverse, yet plausible scenarios.<n>Our evaluation reveals that state-of-the-art LLMs, such as Llama-3.1-70B and GPT-4o, show accuracies as low as 0% and 9% on ExploreToM-generated data.
arXiv Detail & Related papers (2024-12-12T21:29:00Z) - Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Can large language models explore in-context? [87.49311128190143]
We deploy Large Language Models as agents in simple multi-armed bandit environments.
We find that the models do not robustly engage in exploration without substantial interventions.
arXiv Detail & Related papers (2024-03-22T17:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.