Studying the Effects of Cognitive Biases in Evaluation of Conversational
Agents
- URL: http://arxiv.org/abs/2002.07927v2
- Date: Wed, 26 Feb 2020 16:27:37 GMT
- Title: Studying the Effects of Cognitive Biases in Evaluation of Conversational
Agents
- Authors: Sashank Santhanam, Alireza Karduni, Samira Shaikh
- Abstract summary: We conduct a study with 77 crowdsourced workers to understand the role of cognitive biases, specifically anchoring bias, when humans are asked to evaluate the output of conversational agents.
We find increased consistency in ratings across two experimental conditions may be a result of anchoring bias.
- Score: 10.248512149493443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans quite frequently interact with conversational agents. The rapid
advancement in generative language modeling through neural networks has helped
advance the creation of intelligent conversational agents. Researchers
typically evaluate the output of their models through crowdsourced judgments,
but there are no established best practices for conducting such studies.
Moreover, it is unclear if cognitive biases in decision-making are affecting
crowdsourced workers' judgments when they undertake these tasks. To
investigate, we conducted a between-subjects study with 77 crowdsourced workers
to understand the role of cognitive biases, specifically anchoring bias, when
humans are asked to evaluate the output of conversational agents. Our results
provide insight into how best to evaluate conversational agents. We find
increased consistency in ratings across two experimental conditions may be a
result of anchoring bias. We also determine that external factors such as time
and prior experience in similar tasks have effects on inter-rater consistency.
Related papers
- Mitigating Cognitive Biases in Multi-Criteria Crowd Assessment [22.540544209683592]
We focus on cognitive biases associated with a multi-criteria assessment in crowdsourcing.
Crowdworkers who rate targets with multiple different criteria simultaneously may provide biased responses due to prominence of some criteria or global impressions of the evaluation targets.
We propose two specific model structures for Bayesian opinion aggregation models that consider inter-criteria relations.
arXiv Detail & Related papers (2024-07-10T16:00:23Z) - DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews [39.08557916089242]
Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model.
We discover that models using interviewer's prompts learn to focus on a specific region of the interviews, where questions about past experiences with mental health issues are asked.
We achieve a 0.90 F1 score by intentionally exploiting it, the highest result reported to date on this dataset using only textual information.
arXiv Detail & Related papers (2024-04-22T09:07:50Z) - Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News [4.413331329339185]
We study the influence these biases can have in the pervasive problem of fake news by evaluating human participants' capacity to identify false headlines.
By focusing on headlines involving sensitive characteristics, we gather a comprehensive dataset to explore how human responses are shaped by their biases.
We show that demographic factors, headline categories, and the manner in which information is presented significantly influence errors in human judgment.
arXiv Detail & Related papers (2024-03-11T12:08:08Z) - Exploring Conversational Agents as an Effective Tool for Measuring
Cognitive Biases in Decision-Making [0.65268245109828]
The research aims to explore conversational agents as an effective tool to measure various cognitive biases in different domains.
Our initial experiments to measure framing and loss-aversion biases indicate that the conversational agents can be effectively used to measure the biases.
arXiv Detail & Related papers (2024-01-08T10:23:52Z) - Evaluating Subjective Cognitive Appraisals of Emotions from Large
Language Models [47.890846082224066]
This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions.
CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
arXiv Detail & Related papers (2023-10-22T19:12:17Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Understanding How People Rate Their Conversations [73.17730062864314]
We conduct a study to better understand how people rate their interactions with conversational agents.
We focus on agreeableness and extraversion as variables that may explain variation in ratings.
arXiv Detail & Related papers (2022-06-01T00:45:32Z) - Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted
Decision-making [46.625616262738404]
We use knowledge from the field of cognitive science to account for cognitive biases in the human-AI collaborative decision-making setting.
We focus specifically on anchoring bias, a bias commonly encountered in human-AI collaboration.
arXiv Detail & Related papers (2020-10-15T22:25:41Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.