Related papers: Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

URL: http://arxiv.org/abs/2512.01892v1
Date: Mon, 01 Dec 2025 17:12:28 GMT
Title: Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models
Authors: Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, Maysa Malfiza Garcia de Macedo,
Abstract summary: Despite efforts for implementing guardrails, human perceptions of mitigation strategies are largely unknown.<n>We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions.<n>Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations.
Score: 5.323378627597619
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.

Related papers

Plausibility as Failure: How LLMs and Humans Co-Construct Epistemic Error [0.0]
This study examines how different forms of epistemic failure emerge, are masked, and are tolerated in human AI interaction.<n>Evaluators frequently conflated criteria such as correctness, relevance, bias, groundedness, and consistency, indicating that human judgment collapses analytical distinctions into intuitives shaped by form and fluency.<n>The study provides implications for LLM assessment, digital literacy, and the design of trustworthy human AI communication.
arXiv Detail & Related papers (2025-12-18T16:45:29Z)
Modeling Human Responses to Multimodal AI Content [10.65875439980452]
MhAIM dataset contains 154,552 online posts (111,153 of them AI-generated)<n>Our human study reveals that people are better at identifying AI content when posts include both text and visuals.<n>We present T-Lens, an agent system designed to answer user queries by incorporating predicted human responses to multimodal information.
arXiv Detail & Related papers (2025-08-14T15:55:19Z)
Users Favor LLM-Generated Content -- Until They Know It's AI [0.0]
We investigate how individuals evaluate human and large langue models generated responses to popular questions when the source of the content is either concealed or disclosed.<n>Our findings indicate that, overall, participants tend to prefer AI-generated responses.<n>When the AI origin is revealed, this preference diminishes significantly, suggesting that evaluative judgments are influenced by the disclosure of the response's provenance.
arXiv Detail & Related papers (2025-02-23T11:14:02Z)
Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation [2.1944577276732726]
We propose and evaluate strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user.<n>Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness.<n>The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.
arXiv Detail & Related papers (2024-12-10T09:29:52Z)
Assessing the Human Likeness of AI-Generated Counterspeech [10.434435022492723]
This paper investigates the human likeness of AI-generated counterspeech.<n>We implement and evaluate several LLM-based generation strategies.<n>We reveal differences in linguistic characteristics, politeness, and specificity.
arXiv Detail & Related papers (2024-10-14T18:48:47Z)
Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions [25.158868133182025]
We present a method for evaluating the output of generative large language models (LLMs)<n>We use ranking models trained on annotated document collections as a substitute for explicit relevance.<n>In a user study, our method correlates with the preferences of a human expert.
arXiv Detail & Related papers (2024-08-19T09:27:45Z)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction. We find that contextual characteristics significantly affect human reliance behavior. Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z)
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models [52.368110271614285]
We introduce AdvEval, a novel black-box adversarial framework against NLG evaluators. AdvEval is specially tailored to generate data that yield strong disagreements between human and victim evaluators. We conduct experiments on 12 victim evaluators and 11 NLG datasets, spanning tasks including dialogue, summarization, and question evaluation.
arXiv Detail & Related papers (2024-05-23T14:48:15Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
Counterfactual Off-Policy Training for Neural Response Generation [94.76649147381232]
We propose to explore potential responses by counterfactual reasoning. Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space. An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model.
arXiv Detail & Related papers (2020-04-29T22:46:28Z)
Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework [51.237191651923666]
We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback. With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback. Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
arXiv Detail & Related papers (2020-01-23T17:50:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.