Related papers: Competing LLM Agents in a Non-Cooperative Game of Opinion Polarisation

Related papers

Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies [0.7944997500468641]
This study investigates the effectiveness of Anchoring Agents--pre-programmed altruistic entities--in fostering cooperation within a Public Goods Game (PGG)<n>While Anchoring Agents successfully boosted local cooperation rates, cognitive decomposition and transfer tests revealed that this effect was driven by strategic compliance and cognitive offloading rather than genuine norm internalization.<n>These findings highlight a critical gap between behavioral modification and authentic value alignment in artificial societies.
arXiv Detail & Related papers (2026-02-01T17:07:10Z)
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation [76.5533899503582]
Large language models (LLMs) are increasingly used as judges to evaluate agent performance.<n>We show this paradigm implicitly assumes that the agent's chain-of-thought (CoT) reasoning faithfully reflects both its internal reasoning and the underlying environment state.<n>We demonstrate that manipulated reasoning alone can inflate false positive rates of state-of-the-art VLM judges by up to 90% across 800 trajectories spanning diverse web tasks.
arXiv Detail & Related papers (2026-01-21T06:07:43Z)
The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents [0.6117371161379209]
We propose a formal probabilistic framework that provides agents with a non-altruistic motive for bidirectional knowledge exchange.<n>We show how these accumulated belief states serve as verifiable reward signals for Reinforcement Learning from Human Feedback (RLHF) and high-quality data filters for Supervised Fine-Tuning (SFT)<n> Simulation results validate that this uncertainty-driven strategy significantly outperforms random baselines in heterogeneous environments.
arXiv Detail & Related papers (2025-12-24T02:02:25Z)
The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration [72.33801123508145]
Large language models (LLMs) are integral to multi-agent systems.<n>Privacy risks emerge that extend beyond memorization, direct inference, or single-turn evaluations.<n>In particular, seemingly innocuous responses, when composed across interactions, can cumulatively enable adversaries to recover sensitive information.
arXiv Detail & Related papers (2025-09-16T16:57:25Z)
Bias Mitigation Agent: Optimizing Source Selection for Fair and Balanced Knowledge Retrieval [0.0]
Large Language Models (LLMs) have transformed the field of artificial intelligence by unlocking the era of generative applications.<n>Built on top of generative AI capabilities, Agentic AI represents a major shift toward autonomous, goal-driven systems that can reason, retrieve, and act.
arXiv Detail & Related papers (2025-08-26T06:44:04Z)
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions [35.71511502901056]
Large language models (LLMs) are increasingly deployed in multi-agent systems as components of collaborative intelligence.<n>We examine how LLMs form trust from previous impressions, resist misinformation, and integrate peer input during interaction.<n>We present KAIROS, a benchmark simulating quiz contests with peer agents of varying reliability.
arXiv Detail & Related papers (2025-08-24T09:58:10Z)
Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge [37.06831535578792]
We conduct a systematic analysis of four diverse bias types: position bias, verbosity bias, chain-of-thought bias, and bandwagon bias.<n>We evaluate these biases across two widely adopted multi-agent LLM-as-Judge frameworks: Multi-Agent-Debate and LLM-as-Meta-Judge.
arXiv Detail & Related papers (2025-05-26T03:56:41Z)
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization [52.76330545825083]
Reinforcement learning (RL) has become popular in enhancing the reasoning capabilities of large language models (LLMs)<n>We identify a previously unrecognized phenomenon we term Lazy Likelihood Displacement (LLD), wherein the likelihood of correct responses marginally increases or even decreases during training.<n>We develop a method called NTHR, which downweights penalties on tokens contributing to the LLD. Unlike prior DPO-based approaches, NTHR takes advantage of GRPO's group-based structure, using correct responses as anchors to identify influential tokens.
arXiv Detail & Related papers (2025-05-24T18:58:51Z)
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs [83.11815479874447]
We propose a novel jailbreak attack framework, inspired by cognitive decomposition and biases in human cognition.<n>We employ cognitive decomposition to reduce the complexity of malicious prompts and relevance bias to reorganize prompts.<n>We also introduce a ranking-based harmfulness evaluation metric that surpasses the traditional binary success-or-failure paradigm.
arXiv Detail & Related papers (2025-05-03T05:28:11Z)
Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking [7.946359845249688]
Large Language Models (LLMs) have shown strong performance across fact-checking tasks. This paper investigates whether generative agents can meaningfully contribute to fact-checking tasks traditionally reserved for human crowds. Agent crowds outperform human crowds in truthfulness classification, exhibit higher internal consistency, and show reduced susceptibility to social and cognitive biases.
arXiv Detail & Related papers (2025-04-24T18:49:55Z)
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values [13.798198972161657]
A number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes.<n>This paper examines whether large language models (LLMs) adhere to fundamental fairness concepts and investigate their alignment with human preferences.
arXiv Detail & Related papers (2025-02-01T04:24:47Z)
Emergence of human-like polarization among large language model agents [61.622596148368906]
We simulate a networked system involving thousands of large language model agents, discovering their social interactions, result in human-like polarization.<n>Similarities between humans and LLM agents raise concerns about their capacity to amplify societal polarization, but also hold the potential to serve as a valuable testbed for identifying plausible strategies to mitigate it.
arXiv Detail & Related papers (2025-01-09T11:45:05Z)
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.<n>The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.<n>We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z)
Metacognitive Myopia in Large Language Models [0.0]
Large Language Models (LLMs) exhibit potentially harmful biases that reinforce culturally inherent stereotypes, cloud moral judgments, or amplify positive evaluations of majority groups. We propose metacognitive myopia as a cognitive-ecological framework that can account for a conglomerate of established and emerging LLM biases. Our theoretical framework posits that a lack of the two components of metacognition, monitoring and control, causes five symptoms of metacognitive myopia in LLMs.
arXiv Detail & Related papers (2024-08-10T14:43:57Z)
Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias [16.85625861663094]
Motivated by social psychology principles, we propose a novel strategy named textscPeT that inspires LLMs to integrate diverse human perspectives and self-regulate their responses. Rigorous evaluations and ablation studies are conducted on two commercial LLMs and three open-source LLMs, revealing textscPeT's superiority in producing less harmful responses.
arXiv Detail & Related papers (2024-07-22T04:25:01Z)
Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions [0.46873264197900916]
This paper examines the role of cognitive biases in the decision-making processes of large language models (LLMs) We show that certain cognitive biases when properly balanced, can enhance decision-making efficiency through rational deviations and shortcuts.
arXiv Detail & Related papers (2024-06-16T16:25:22Z)
Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs [6.090496490133132]
We propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback in traditional RLHF. We utilize LLMs in multi-role debates to create a dataset that includes both high-bias and low-bias instances for training the reward model in reinforcement learning.
arXiv Detail & Related papers (2024-04-15T22:18:50Z)
Prompting Fairness: Integrating Causality to Debias Large Language Models [19.76215433424235]
Large language models (LLMs) are susceptible to generating biased and discriminatory responses. We propose a causality-guided debiasing framework to tackle social biases.
arXiv Detail & Related papers (2024-03-13T17:46:28Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization [61.39201891894024]
Group distributionally robust optimization (group DRO) can minimize the worst-case loss over pre-defined groups. We reformulate the group DRO framework by proposing Q-Diversity. Characterized by an interactive training mode, Q-Diversity relaxes the group identification from annotation into direct parameterization.
arXiv Detail & Related papers (2023-05-20T07:02:27Z)
Coordinated Online Learning for Multi-Agent Systems with Coupled Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints. The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.