Related papers: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

URL: http://arxiv.org/abs/2601.22440v1
Date: Fri, 30 Jan 2026 01:19:54 GMT
Title: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations
Authors: Bhada Yun, Renn Su, April Yi Wang,
Abstract summary: 13 participants left our study convinced that AI can understand human values.<n>We warn about "weaponized empathy": a potentially dangerous design pattern that may arise in value-aligned, yet welfare-misaligned AI.
Score: 9.253070075439746
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Does AI understand human values? While this remains an open philosophical question, we take a pragmatic stance by introducing VAPT, the Value-Alignment Perception Toolkit, for studying how LLMs reflect people's values and how people judge those reflections. 20 participants texted a human-like chatbot over a month, then completed a 2-hour interview with our toolkit evaluating AI's ability to extract (pull details regarding), embody (make decisions guided by), and explain (provide proof of) human values. 13 participants left our study convinced that AI can understand human values. Participants found the experience insightful for self-reflection and found themselves getting persuaded by the AI's reasoning. Thus, we warn about "weaponized empathy": a potentially dangerous design pattern that may arise in value-aligned, yet welfare-misaligned AI. VAPT offers concrete artifacts and design implications to evaluate and responsibly build value-aligned conversational agents with transparency, consent, and safeguards as AI grows more capable and human-like into the future.

Related papers

AI-exhibited Personality Traits Can Shape Human Self-concept through Conversations [26.039188521284974]
We show how AI personality traits can shape users' self-concepts through human-AI conversation.<n>We provide important design implications for developing more responsible and ethical AI systems.
arXiv Detail & Related papers (2026-01-19T05:16:57Z)
Human Bias in the Face of AI: Examining Human Judgment Against Text Labeled as AI Generated [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.<n>We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z)
Navigating AI Fallibility: Examining People's Reactions and Perceptions of AI after Encountering Personality Misrepresentations [7.256711790264119]
Hyper-personalized AI systems profile people's characteristics to provide personalized recommendations. These systems are not immune to errors when making inferences about people's most personal traits. We present two studies to examine how people react and perceive AI after encountering personality misrepresentations.
arXiv Detail & Related papers (2024-05-25T21:27:15Z)
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another. As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts. We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z)
Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap [56.611702960809644]
We benchmark AI's ability to imitate humans in three language tasks and three vision tasks.<n>Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges.<n>Imitation ability showed minimal correlation with conventional AI performance metrics.
arXiv Detail & Related papers (2022-11-23T16:16:52Z)
Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs [22.52332536886295]
We present a novel formulation of the interaction between the human and the AI as a sequential game. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human.
arXiv Detail & Related papers (2022-04-03T21:00:51Z)
Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations. It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)
The Who in XAI: How AI Background Shapes Perceptions of AI Explanations [61.49776160925216]
We conduct a mixed-methods study of how two different groups--people with and without AI background--perceive different types of AI explanations. We find that (1) both groups showed unwarranted faith in numbers for different reasons and (2) each group found value in different explanations beyond their intended design.
arXiv Detail & Related papers (2021-07-28T17:32:04Z)
Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z)
Human Evaluation of Interpretability: The Case of AI-Generated Music Knowledge [19.508678969335882]
We focus on evaluating AI-discovered knowledge/rules in the arts and humanities. We present an experimental procedure to collect and assess human-generated verbal interpretations of AI-generated music theory/rules rendered as sophisticated symbolic/numeric objects.
arXiv Detail & Related papers (2020-04-15T06:03:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.