LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo
- URL: http://arxiv.org/abs/2508.08163v2
- Date: Sun, 05 Oct 2025 01:07:12 GMT
- Title: LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo
- Authors: Mandira Sawkar, Samay U. Shetty, Deepak Pandita, Tharindu Cyril Weerasooriya, Christopher M. Homan,
- Abstract summary: Learning With Disagreements (LeWiDi) 2025 aims to model annotator disagreement through soft label distribution prediction and perspectivist evaluation.<n>We adapt DisCo, a neural architecture that jointly models item-level and annotator-level label distributions, and present detailed analysis and improvements.
- Score: 6.4877384679152525
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Learning With Disagreements (LeWiDi) 2025 shared task aims to model annotator disagreement through soft label distribution prediction and perspectivist evaluation, which focuses on modeling individual annotators. We adapt DisCo (Distribution from Context), a neural architecture that jointly models item-level and annotator-level label distributions, and present detailed analysis and improvements. In this paper, we extend DisCo by introducing annotator metadata embeddings, enhancing input representations, and multi-objective training losses to capture disagreement patterns better. Through extensive experiments, we demonstrate substantial improvements in both soft and perspectivist evaluation metrics across three datasets. We also conduct in-depth calibration and error analyses that reveal when and why disagreement-aware modeling improves. Our findings show that disagreement can be better captured by conditioning on annotator demographics and by optimizing directly for distributional metrics, yielding consistent improvements across datasets.
Related papers
- Revisiting the Learning Objectives of Vision-Language Reward Models [19.768973349254285]
Learning generalizable reward functions is a core challenge in embodied intelligence.<n>Recent work leverages contrastive vision language models (VLMs) to obtain dense, domain-agnostic rewards without human supervision.<n>We evaluate recent VLM-based reward models under a unified framework with identical backbones, finetuning data, and evaluation environments.
arXiv Detail & Related papers (2025-12-20T19:50:36Z) - Explainable Human-in-the-Loop Segmentation via Critic Feedback Signals [0.20999222360659608]
We propose a human-in-the-loop interactive framework that enables interventional learning through targeted human corrections of segmentation outputs.<n>We demonstrate that our framework improves segmentation accuracy by up to 9 mIoU points on challenging cubemap data.<n>This work provides a practical framework for researchers and practitioners seeking to build segmentation systems that are accurate, robust to dataset biases, data-efficient, and adaptable to real-world domains such as urban climate monitoring and autonomous driving.
arXiv Detail & Related papers (2025-10-11T01:16:41Z) - Reward Models are Metrics in a Trench Coat [8.100404050572996]
We find that the two research areas are mostly separate, leading to redundant terminology and repeated pitfalls.<n>Common challenges include susceptibility to spurious correlations, impact on downstream reward hacking, methods to improve data quality, and approaches to meta-evaluation.<n>Our position paper argues that a closer collaboration between the fields can help overcome these issues.
arXiv Detail & Related papers (2025-10-03T17:59:44Z) - Prior Distribution and Model Confidence [0.0]
We propose a framework to understand the confidence of model predictions on unseen data without the need for retraining.<n>Our approach filters out low-confidence predictions based on their distance from the training distribution in the embedding space.<n>The proposed method is model-agnostic and generalizable, with potential applications beyond computer vision.
arXiv Detail & Related papers (2025-09-05T20:17:26Z) - Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z) - UMB@PerAnsSumm 2025: Enhancing Perspective-Aware Summarization with Prompt Optimization and Supervised Fine-Tuning [8.095763327154335]
We present our approach to the PerAnsSumm Shared Task, which involves perspective span identification and perspective-aware summarization.<n>For span identification, we adopt ensemble learning that integrates three transformer models through averaging to exploit individual model strengths.<n>For summarization, we design a suite of Chain-of-Thought (CoT) prompting strategies that incorporate keyphrases and guide information to structure summary generation into manageable steps.
arXiv Detail & Related papers (2025-03-14T06:29:51Z) - Entity-Aware Biaffine Attention Model for Improved Constituent Parsing with Reduced Entity Violations [0.0]
We propose an entity-aware biaffine attention model for constituent parsing.
This model incorporates entity information into the biaffine attention mechanism by using additional entity role vectors for potential phrases.
We introduce a new metric, the Entity Violating Rate (EVR), to quantify the extent of entity violations in parsing results.
arXiv Detail & Related papers (2024-09-01T05:59:54Z) - The Inter-Intra Modal Measure: A Predictive Lens on Fine-Tuning Outcomes in Vision-Language Models [6.7181844004432385]
We introduce the Inter-Intra Modal Measure (IIMM) - a predictive metric that quantifies the relationship between intra-modal image embedding similarity and inter-modal misalignment.<n>Compared to existing transferability measures, the IIMM demonstrates significantly stronger predictive power for accuracy changes post fine-tuning in dual-encoder models.<n>We provide a theoretical bound, proving that changes in IIMM are limited by the Wasserstein distance between pre- and post-fine-tuning embedding.
arXiv Detail & Related papers (2024-07-22T15:35:09Z) - PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning [49.60634126342945]
Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes.
Recent research reveals that training with CAD may lead models to overly focus on modified features while ignoring other important contextual information.
We employ contrastive learning to promote global feature alignment in addition to learning counterfactual clues.
arXiv Detail & Related papers (2024-06-09T07:29:55Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.