Related papers: Improving Opinion-based Question Answering Systems Through Label Error Detection and Overwrite

Improving Opinion-based Question Answering Systems Through Label Error Detection and Overwrite

URL: http://arxiv.org/abs/2306.07499v1
Date: Tue, 13 Jun 2023 02:20:58 GMT
Title: Improving Opinion-based Question Answering Systems Through Label Error Detection and Overwrite
Authors: Xiao Yang, Ahmed K. Mohamed, Shashank Jain, Stanislav Peshterliev, Debojeet Chatterjee, Hanwen Zha, Nikita Bhalla, Gagan Aneja and Pranab Mohanty
Abstract summary: We propose LEDO: a model-agnostic and computationally efficient framework for Label Error Detection and Overwrite. LEDO is based on Monte Carlo Dropout combined with uncertainty metrics, and can be easily generalized to multiple tasks and data sets. Applying LEDO to an industry opinion-based question answering system demonstrates it is effective at improving accuracy in all the core models.
Score: 4.894035903847371
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Label error is a ubiquitous problem in annotated data. Large amounts of label error substantially degrades the quality of deep learning models. Existing methods to tackle the label error problem largely focus on the classification task, and either rely on task specific architecture or require non-trivial additional computations, which is undesirable or even unattainable for industry usage. In this paper, we propose LEDO: a model-agnostic and computationally efficient framework for Label Error Detection and Overwrite. LEDO is based on Monte Carlo Dropout combined with uncertainty metrics, and can be easily generalized to multiple tasks and data sets. Applying LEDO to an industry opinion-based question answering system demonstrates it is effective at improving accuracy in all the core models. Specifically, LEDO brings 1.1% MRR gain for the retrieval model, 1.5% PR AUC improvement for the machine reading comprehension model, and 0.9% rise in the Average Precision for the ranker, on top of the strong baselines with a large-scale social media dataset. Importantly, LEDO is computationally efficient compared to methods that require loss function change, and cost-effective as the resulting data can be used in the same continuous training pipeline for production. Further analysis shows that these gains come from an improved decision boundary after cleaning the label errors existed in the training data.

Related papers

ZeroED: Hybrid Zero-shot Error Detection through Large Language Model Reasoning [45.352592886478774]
We propose ZeroED, a novel hybrid zero-shot error detection framework. ZeroED operates in four steps, i.e., feature representation, error labeling, training data construction, and detector training. Experiments show ZeroED substantially outperforms state-of-the-art methods by a maximum 30% improvement in F1 score and up to 90% token cost reduction.
arXiv Detail & Related papers (2025-04-06T10:28:41Z)
Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets [19.844836459291546]
High-quality, error-free datasets are a key ingredient in building reliable, accurate, and unbiased machine learning (ML) models. However, real world datasets often suffer from errors due to sensor malfunctions, data entry mistakes, or improper data integration across multiple sources. In this study, we investigate whether Large Language Models (LLMs) can help alleviate the burden of manual data cleaning.
arXiv Detail & Related papers (2025-03-09T15:29:46Z)
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [21.926934384262594]
Large language models (LLMs) offer new opportunities to enhance the annotation process. We compare expert, crowd-sourced, and our LLM-based annotations in terms of agreement, label quality, and efficiency. Our findings reveal a substantial number of label errors, which, when corrected, induce a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy [19.154826741973277]
We propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. We also develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability.
arXiv Detail & Related papers (2024-05-21T05:17:43Z)
Improving Label Error Detection and Elimination with Uncertainty Quantification [5.184615738004059]
We develop novel, model-agnostic algorithms for Uncertainty Quantification-Based Label Error Detection (UQ-LED) Our UQ-LED algorithms outperform state-of-the-art confident learning in identifying label errors. We propose a novel approach to generate realistic, class-dependent label errors synthetically.
arXiv Detail & Related papers (2024-05-15T15:17:52Z)
Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks. The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z)
Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data. Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z)
Active Transfer Prototypical Network: An Efficient Labeling Algorithm for Time-Series Data [1.7205106391379026]
This paper proposes a novel Few-Shot Learning (FSL)-based AL framework, which addresses the trade-off problem by incorporating a Prototypical Network (ProtoNet) in the AL iterations. This framework was validated on UCI HAR/HAPT dataset and a real-world braking maneuver dataset. The learning performance significantly surpasses traditional AL algorithms on both datasets, achieving 90% classification accuracy with 10% and 5% labeling effort, respectively.
arXiv Detail & Related papers (2022-09-28T16:14:40Z)
Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare. In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples. To tackle this problem, we build a robust one-class classification framework via data refinement. We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z)
Don't Wait, Just Weight: Improving Unsupervised Representations by Learning Goal-Driven Instance Weights [92.16372657233394]
Self-supervised learning techniques can boost performance by learning useful representations from unlabelled data. We show that by learning Bayesian instance weights for the unlabelled data, we can improve the downstream classification accuracy. Our method, BetaDataWeighter is evaluated using the popular self-supervised rotation prediction task on STL-10 and Visual Decathlon.
arXiv Detail & Related papers (2020-06-22T15:59:32Z)
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE) In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.