Do Differences in Values Influence Disagreements in Online Discussions?
- URL: http://arxiv.org/abs/2310.15757v1
- Date: Tue, 24 Oct 2023 12:00:59 GMT
- Title: Do Differences in Values Influence Disagreements in Online Discussions?
- Authors: Michiel van der Meer, Piek Vossen, Catholijn M. Jonker, Pradeep K.
Murukannaiah
- Abstract summary: We show how state-of-the-art models can be used for estimating values in online discussions.
We evaluate the estimated value profiles based on human-annotated agreement labels.
We find that the dissimilarity of value profiles correlates with disagreement in specific cases.
- Score: 4.128725138940779
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Disagreements are common in online discussions. Disagreement may foster
collaboration and improve the quality of a discussion under some conditions.
Although there exist methods for recognizing disagreement, a deeper
understanding of factors that influence disagreement is lacking in the
literature. We investigate a hypothesis that differences in personal values are
indicative of disagreement in online discussions. We show how state-of-the-art
models can be used for estimating values in online discussions and how the
estimated values can be aggregated into value profiles. We evaluate the
estimated value profiles based on human-annotated agreement labels. We find
that the dissimilarity of value profiles correlates with disagreement in
specific cases. We also find that including value information in agreement
prediction improves performance.
Related papers
- Rater Cohesion and Quality from a Vicarious Perspective [22.445283423317754]
Vicarious annotation is a method for breaking down disagreement by asking raters how they think others would annotate the data.
We employ rater cohesion metrics to study the potential influence of political affiliations and demographic backgrounds on raters' perceptions of offense.
We study how the rater quality metrics influence the in-group and cross-group rater cohesion across the personal and vicarious levels.
arXiv Detail & Related papers (2024-08-15T20:37:36Z) - Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - On the Definition of Appropriate Trust and the Tools that Come with it [0.0]
This paper starts with the definitions of appropriate trust from the literature.
It compares the definitions with model performance evaluation, showing the strong similarities between appropriate trust and model performance evaluation.
The paper offers several straightforward evaluation methods for different aspects of user performance, including suggesting a method for measuring uncertainty and appropriate trust in regression.
arXiv Detail & Related papers (2023-09-21T09:52:06Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue
Evaluation [68.59356746305255]
We propose a novel model-agnostic approach to measure the turn-level interaction between the system and the user.
Our approach significantly improves the correlation with human judgment compared with existing evaluation systems.
arXiv Detail & Related papers (2023-06-27T06:58:03Z) - Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)
We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.
Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z) - Reckoning with the Disagreement Problem: Explanation Consensus as a
Training Objective [5.949779668853556]
Post hoc feature attribution is a family of methods for giving each feature in an input a score corresponding to its influence on a model's output.
A major limitation of this family of explainers is that they can disagree on which features are more important than others.
We introduce a loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers.
We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term
arXiv Detail & Related papers (2023-03-23T14:35:37Z) - Just Rank: Rethinking Evaluation with Word and Sentence Similarities [105.5541653811528]
intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade.
This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations.
We propose a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks.
arXiv Detail & Related papers (2022-03-05T08:40:05Z) - On Quantitative Evaluations of Counterfactuals [88.42660013773647]
This paper consolidates work on evaluating visual counterfactual examples through an analysis and experiments.
We find that while most metrics behave as intended for sufficiently simple datasets, some fail to tell the difference between good and bad counterfactuals when the complexity increases.
We propose two new metrics, the Label Variation Score and the Oracle score, which are both less vulnerable to such tiny changes.
arXiv Detail & Related papers (2021-10-30T05:00:36Z) - On the Interaction of Belief Bias and Explanations [4.211128681972148]
We provide an overview of belief bias, its role in human evaluation, and ideas for NLP practitioners on how to account for it.
We show that conclusions about the highest performing methods change when introducing such controls, pointing to the importance of accounting for belief bias in evaluation.
arXiv Detail & Related papers (2021-06-29T12:49:42Z) - I Beg to Differ: A study of constructive disagreement in online
conversations [15.581515781839656]
We construct a corpus of 7 425 Wikipedia Talk page conversations that contain content disputes.
We define the task of predicting whether disagreements will be escalated to mediation by a moderator.
We develop a variety of neural models and show that taking into account the structure of the conversation improves predictive accuracy.
arXiv Detail & Related papers (2021-01-26T16:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.