Studying the Effects of Cognitive Biases in Evaluation of Conversational
Agents
- URL: http://arxiv.org/abs/2002.07927v2
- Date: Wed, 26 Feb 2020 16:27:37 GMT
- Title: Studying the Effects of Cognitive Biases in Evaluation of Conversational
Agents
- Authors: Sashank Santhanam, Alireza Karduni, Samira Shaikh
- Abstract summary: We conduct a study with 77 crowdsourced workers to understand the role of cognitive biases, specifically anchoring bias, when humans are asked to evaluate the output of conversational agents.
We find increased consistency in ratings across two experimental conditions may be a result of anchoring bias.
- Score: 10.248512149493443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans quite frequently interact with conversational agents. The rapid
advancement in generative language modeling through neural networks has helped
advance the creation of intelligent conversational agents. Researchers
typically evaluate the output of their models through crowdsourced judgments,
but there are no established best practices for conducting such studies.
Moreover, it is unclear if cognitive biases in decision-making are affecting
crowdsourced workers' judgments when they undertake these tasks. To
investigate, we conducted a between-subjects study with 77 crowdsourced workers
to understand the role of cognitive biases, specifically anchoring bias, when
humans are asked to evaluate the output of conversational agents. Our results
provide insight into how best to evaluate conversational agents. We find
increased consistency in ratings across two experimental conditions may be a
result of anchoring bias. We also determine that external factors such as time
and prior experience in similar tasks have effects on inter-rater consistency.
Related papers
- DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews [39.08557916089242]
Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model.
We discover that models using interviewer's prompts learn to focus on a specific region of the interviews, where questions about past experiences with mental health issues are asked.
We achieve a 0.90 F1 score by intentionally exploiting it, the highest result reported to date on this dataset using only textual information.
arXiv Detail & Related papers (2024-04-22T09:07:50Z) - RoleInteract: Evaluating the Social Interaction of Role-Playing Agents [85.6641890712617]
We introduce the first benchmark designed to evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions.
The benchmark is constructed from a variety of sources and covers a wide range of 500 characters and over 6,000 question prompts.
We find that agents excelling in individual level does not imply their proficiency in group level.
arXiv Detail & Related papers (2024-03-20T15:38:36Z) - Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News [4.413331329339185]
We study the influence these biases can have in the pervasive problem of fake news by evaluating human participants' capacity to identify false headlines.
By focusing on headlines involving sensitive characteristics, we gather a comprehensive dataset to explore how human responses are shaped by their biases.
We show that demographic factors, headline categories, and the manner in which information is presented significantly influence errors in human judgment.
arXiv Detail & Related papers (2024-03-11T12:08:08Z) - Exploring Conversational Agents as an Effective Tool for Measuring
Cognitive Biases in Decision-Making [0.65268245109828]
The research aims to explore conversational agents as an effective tool to measure various cognitive biases in different domains.
Our initial experiments to measure framing and loss-aversion biases indicate that the conversational agents can be effectively used to measure the biases.
arXiv Detail & Related papers (2024-01-08T10:23:52Z) - Evaluating Subjective Cognitive Appraisals of Emotions from Large
Language Models [47.890846082224066]
This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions.
CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
arXiv Detail & Related papers (2023-10-22T19:12:17Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - My Actions Speak Louder Than Your Words: When User Behavior Predicts
Their Beliefs about Agents' Attributes [5.893351309010412]
Behavioral science suggests that people sometimes use irrelevant information.
We identify an instance of this phenomenon, where users who experience better outcomes in a human-agent interaction systematically rated the agent as having better abilities, being more benevolent, and exhibiting greater integrity in a post hoc assessment than users who experienced worse outcome -- which were the result of their own behavior -- with the same agent.
Our analyses suggest the need for augmentation of models so that they account for such biased perceptions as well as mechanisms so that agents can detect and even actively work to correct this and similar biases of users.
arXiv Detail & Related papers (2023-01-21T21:26:32Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Understanding How People Rate Their Conversations [73.17730062864314]
We conduct a study to better understand how people rate their interactions with conversational agents.
We focus on agreeableness and extraversion as variables that may explain variation in ratings.
arXiv Detail & Related papers (2022-06-01T00:45:32Z) - Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted
Decision-making [46.625616262738404]
We use knowledge from the field of cognitive science to account for cognitive biases in the human-AI collaborative decision-making setting.
We focus specifically on anchoring bias, a bias commonly encountered in human-AI collaboration.
arXiv Detail & Related papers (2020-10-15T22:25:41Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.