iLab at SemEval-2023 Task 11 Le-Wi-Di: Modelling Disagreement or
Modelling Perspectives?
- URL: http://arxiv.org/abs/2305.06074v1
- Date: Wed, 10 May 2023 11:55:17 GMT
- Title: iLab at SemEval-2023 Task 11 Le-Wi-Di: Modelling Disagreement or
Modelling Perspectives?
- Authors: Nikolas Vitsakis, Amit Parekh, Tanvi Dinkar, Gavin Abercrombie,
Ioannis Konstas, Verena Rieser
- Abstract summary: We adapt a multi-task architecture to evaluate its performance on the SEMEVAL Task 11.
We find that a multi-task approach performed poorly on datasets which contained distinct annotator opinions.
We argue that perspectivist approaches are preferable because they enable decision makers to amplify minority views.
- Score: 17.310208612897814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There are two competing approaches for modelling annotator disagreement:
distributional soft-labelling approaches (which aim to capture the level of
disagreement) or modelling perspectives of individual annotators or groups
thereof. We adapt a multi-task architecture -- which has previously shown
success in modelling perspectives -- to evaluate its performance on the SEMEVAL
Task 11. We do so by combining both approaches, i.e. predicting individual
annotator perspectives as an interim step towards predicting annotator
disagreement. Despite its previous success, we found that a multi-task approach
performed poorly on datasets which contained distinct annotator opinions,
suggesting that this approach may not always be suitable when modelling
perspectives. Furthermore, our results explain that while strongly
perspectivist approaches might not achieve state-of-the-art performance
according to evaluation metrics used by distributional approaches, our approach
allows for a more nuanced understanding of individual perspectives present in
the data. We argue that perspectivist approaches are preferable because they
enable decision makers to amplify minority views, and that it is important to
re-evaluate metrics to reflect this goal.
Related papers
- Where is this coming from? Making groundedness count in the evaluation of Document VQA models [12.951716701565019]
We argue that common evaluation metrics do not account for the semantic and multimodal groundedness of a model's outputs.
We propose a new evaluation methodology that accounts for the groundedness of predictions.
Our proposed methodology is parameterized in such a way that users can configure the score according to their preferences.
arXiv Detail & Related papers (2025-03-24T20:14:46Z) - Embracing Diversity: A Multi-Perspective Approach with Soft Labels [3.529000007777341]
We propose a new framework for designing perspective-aware models on stance detection task, in which multiple annotators assign stances based on a controversial topic.
Results show that the multi-perspective approach yields better classification performance (higher F1-scores)
arXiv Detail & Related papers (2025-03-01T13:33:38Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency.
We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data.
Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - An Operational Perspective to Fairness Interventions: Where and How to
Intervene [9.833760837977222]
We present a holistic framework for evaluating and contextualizing fairness interventions.
We demonstrate our framework with a case study on predictive parity.
We find predictive parity is difficult to achieve without using group data.
arXiv Detail & Related papers (2023-02-03T07:04:33Z) - Improving Narrative Relationship Embeddings by Training with Additional
Inverse-Relationship Constraints [0.0]
We consider the problem of embedding character-entity relationships from the reduced semantic space of narratives.
We analyze this assumption and compare the approach to a baseline state-of-the-art model with a unique evaluation that simulates efficacy on a downstream clustering task with human-created labels.
arXiv Detail & Related papers (2022-12-21T17:59:11Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium
Segmentation [0.0]
We present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views.
Our method has achieved a competitive result over the state-of-the-art semisupervised methods on the Atrial Challenge dataset.
arXiv Detail & Related papers (2021-09-20T14:51:42Z) - Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation.
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z) - A Brief Introduction to Generative Models [8.031257560764336]
We introduce and motivate generative modeling as a central task for machine learning.
We outline the maximum likelihood approach and how it can be interpreted as minimizing KL-divergence.
We explore the alternative adversarial approach which involves studying the differences between an estimating distribution and a real data distribution.
arXiv Detail & Related papers (2021-02-27T16:49:41Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.