Improving Opinion-based Question Answering Systems Through Label Error
Detection and Overwrite
- URL: http://arxiv.org/abs/2306.07499v1
- Date: Tue, 13 Jun 2023 02:20:58 GMT
- Title: Improving Opinion-based Question Answering Systems Through Label Error
Detection and Overwrite
- Authors: Xiao Yang, Ahmed K. Mohamed, Shashank Jain, Stanislav Peshterliev,
Debojeet Chatterjee, Hanwen Zha, Nikita Bhalla, Gagan Aneja and Pranab
Mohanty
- Abstract summary: We propose LEDO: a model-agnostic and computationally efficient framework for Label Error Detection and Overwrite.
LEDO is based on Monte Carlo Dropout combined with uncertainty metrics, and can be easily generalized to multiple tasks and data sets.
Applying LEDO to an industry opinion-based question answering system demonstrates it is effective at improving accuracy in all the core models.
- Score: 4.894035903847371
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Label error is a ubiquitous problem in annotated data. Large amounts of label
error substantially degrades the quality of deep learning models. Existing
methods to tackle the label error problem largely focus on the classification
task, and either rely on task specific architecture or require non-trivial
additional computations, which is undesirable or even unattainable for industry
usage. In this paper, we propose LEDO: a model-agnostic and computationally
efficient framework for Label Error Detection and Overwrite. LEDO is based on
Monte Carlo Dropout combined with uncertainty metrics, and can be easily
generalized to multiple tasks and data sets. Applying LEDO to an industry
opinion-based question answering system demonstrates it is effective at
improving accuracy in all the core models. Specifically, LEDO brings 1.1% MRR
gain for the retrieval model, 1.5% PR AUC improvement for the machine reading
comprehension model, and 0.9% rise in the Average Precision for the ranker, on
top of the strong baselines with a large-scale social media dataset.
Importantly, LEDO is computationally efficient compared to methods that require
loss function change, and cost-effective as the resulting data can be used in
the same continuous training pipeline for production. Further analysis shows
that these gains come from an improved decision boundary after cleaning the
label errors existed in the training data.
Related papers
- Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [21.926934384262594]
Large language models (LLMs) offer new opportunities to enhance the annotation process.
We compare expert, crowd-sourced, and our LLM-based annotations in terms of agreement, label quality, and efficiency.
Our findings reveal a substantial number of label errors, which, when corrected, induce a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z) - Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy [19.154826741973277]
We propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels.
We also develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability.
arXiv Detail & Related papers (2024-05-21T05:17:43Z) - Improving Label Error Detection and Elimination with Uncertainty Quantification [5.184615738004059]
We develop novel, model-agnostic algorithms for Uncertainty Quantification-Based Label Error Detection (UQ-LED)
Our UQ-LED algorithms outperform state-of-the-art confident learning in identifying label errors.
We propose a novel approach to generate realistic, class-dependent label errors synthetically.
arXiv Detail & Related papers (2024-05-15T15:17:52Z) - Parameter-tuning-free data entry error unlearning with adaptive
selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning.
We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks.
The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z) - Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data.
Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z) - Active Transfer Prototypical Network: An Efficient Labeling Algorithm
for Time-Series Data [1.7205106391379026]
This paper proposes a novel Few-Shot Learning (FSL)-based AL framework, which addresses the trade-off problem by incorporating a Prototypical Network (ProtoNet) in the AL iterations.
This framework was validated on UCI HAR/HAPT dataset and a real-world braking maneuver dataset.
The learning performance significantly surpasses traditional AL algorithms on both datasets, achieving 90% classification accuracy with 10% and 5% labeling effort, respectively.
arXiv Detail & Related papers (2022-09-28T16:14:40Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Don't Wait, Just Weight: Improving Unsupervised Representations by
Learning Goal-Driven Instance Weights [92.16372657233394]
Self-supervised learning techniques can boost performance by learning useful representations from unlabelled data.
We show that by learning Bayesian instance weights for the unlabelled data, we can improve the downstream classification accuracy.
Our method, BetaDataWeighter is evaluated using the popular self-supervised rotation prediction task on STL-10 and Visual Decathlon.
arXiv Detail & Related papers (2020-06-22T15:59:32Z) - TACRED Revisited: A Thorough Evaluation of the TACRED Relation
Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE)
In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement?
We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.