Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems
- URL: http://arxiv.org/abs/2502.14019v1
- Date: Wed, 19 Feb 2025 18:06:37 GMT
- Title: Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems
- Authors: Myra Cheng, Su Lin Blodgett, Alicia DeVrio, Lisa Egede, Alexandra Olteanu,
- Abstract summary: How to intervene on such system outputs to mitigate anthropomorphic behaviors and their attendant harmful outcomes remains understudied.
We compile an inventory of interventions grounded both in prior literature and a crowdsourced study where participants edited system outputs to make them less human-like.
- Score: 55.99010491370177
- License:
- Abstract: As text generation systems' outputs are increasingly anthropomorphic -- perceived as human-like -- scholars have also raised increasing concerns about how such outputs can lead to harmful outcomes, such as users over-relying or developing emotional dependence on these systems. How to intervene on such system outputs to mitigate anthropomorphic behaviors and their attendant harmful outcomes, however, remains understudied. With this work, we aim to provide empirical and theoretical grounding for developing such interventions. To do so, we compile an inventory of interventions grounded both in prior literature and a crowdsourced study where participants edited system outputs to make them less human-like. Drawing on this inventory, we also develop a conceptual framework to help characterize the landscape of possible interventions, articulate distinctions between different types of interventions, and provide a theoretical basis for evaluating the effectiveness of different interventions.
Related papers
- Investigating social alignment via mirroring in a system of interacting language models [16.304359423423648]
We study the effect of mirroring on alignment in multi-agent systems.
We simulate systems of interacting large language models in this framework.
We find that system behavior is strongly influenced by the range of communication of each agent.
arXiv Detail & Related papers (2024-12-07T02:19:57Z) - Generative Intervention Models for Causal Perturbation Modeling [80.72074987374141]
In many applications, it is a priori unknown which mechanisms of a system are modified by an external perturbation.
We propose a generative intervention model (GIM) that learns to map these perturbation features to distributions over atomic interventions.
arXiv Detail & Related papers (2024-11-21T10:37:57Z) - Estimating Causal Effects of Text Interventions Leveraging LLMs [7.2937547395453315]
This paper proposes a novel approach to estimate causal effects using text transformations facilitated by large language models (LLMs)
Unlike existing methods, our approach accommodates arbitrary textual interventions and leverages text-level classifiers with domain adaptation ability to produce robust effect estimates against domain shifts.
This flexibility in handling various text interventions is a key advancement in causal estimation for textual data, offering opportunities to better understand human behaviors and develop effective policies within social systems.
arXiv Detail & Related papers (2024-10-28T19:19:35Z) - Composable Interventions for Language Models [60.32695044723103]
Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining.
But despite a flood of new methods, different types of interventions are largely developing independently.
We introduce composable interventions, a framework to study the effects of using multiple interventions on the same language models.
arXiv Detail & Related papers (2024-07-09T01:17:44Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - Expanding the Role of Affective Phenomena in Multimodal Interaction
Research [57.069159905961214]
We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing.
We identify 910 affect-related papers and present our analysis of the role of affective phenomena in these papers.
We find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states.
arXiv Detail & Related papers (2023-05-18T09:08:39Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - When and How to Fool Explainable Models (and Humans) with Adversarial
Examples [1.439518478021091]
We explore the possibilities and limits of adversarial attacks for explainable machine learning models.
First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios.
Next, we propose a comprehensive framework to study whether adversarial examples can be generated for explainable models.
arXiv Detail & Related papers (2021-07-05T11:20:55Z) - Deep Interpretable Models of Theory of Mind For Human-Agent Teaming [0.7734726150561086]
We develop an interpretable modular neural framework for modeling the intentions of other observed entities.
We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft.
arXiv Detail & Related papers (2021-04-07T06:18:58Z) - Understanding the Effect of Out-of-distribution Examples and Interactive
Explanations on Human-AI Decision Making [19.157591744997355]
We argue that the typical experimental setup limits the potential of human-AI teams.
We develop novel interfaces to support interactive explanations so that humans can actively engage with AI assistance.
arXiv Detail & Related papers (2021-01-13T19:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.