Learning From How Human Correct
- URL: http://arxiv.org/abs/2102.00225v1
- Date: Sat, 30 Jan 2021 13:13:50 GMT
- Title: Learning From How Human Correct
- Authors: Tong Guo
- Abstract summary: In industry NLP application, our manually labeled data has a certain number of noisy data.
We present a simple method to find the noisy data and relabel them manually, while we collect the correction information.
Then we present novel method to incorporate the human correction information into deep learning model.
- Score: 0.685316573653194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In industry NLP application, our manually labeled data has a certain number
of noisy data. We present a simple method to find the noisy data and relabel
them manually, meanwhile we collect the correction information. Then we present
novel method to incorporate the human correction information into deep learning
model. Human know how to correct noisy data. So the correction information can
be inject into deep learning model. We do the experiment on our own text
classification dataset, which is manually labeled, because we relabel the noisy
data in our dataset for our industry application. The experiment result shows
that our method improve the classification accuracy from 91.7% to 92.5%. The
91.7% baseline is based on BERT training on the corrected dataset, which is
hard to surpass.
Related papers
- Early Stopping Against Label Noise Without Validation Data [54.27621957395026]
We propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model.
We show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.
arXiv Detail & Related papers (2025-02-11T13:40:15Z) - Learning from Noisy Labels via Self-Taught On-the-Fly Meta Loss Rescaling [6.861041888341339]
We propose unsupervised on-the-fly meta loss rescaling to reweight training samples.
We are among the first to attempt on-the-fly training data reweighting on the challenging task of dialogue modeling.
Our strategy is robust in the face of noisy and clean data, handles class imbalance, and prevents overfitting to noisy labels.
arXiv Detail & Related papers (2024-12-17T14:37:50Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal [4.71154003227418]
We propose AGRA: a new method for learning with noisy labels by using Adaptive GRAdient-based outlier removal.
By comparing the aggregated gradient of a batch of samples and an individual example gradient, our method dynamically decides whether a corresponding example is helpful for the model.
Extensive evaluation on several datasets demonstrates AGRA's effectiveness.
arXiv Detail & Related papers (2023-06-07T15:10:01Z) - The Re-Label Method For Data-Centric Machine Learning [0.24475591916185496]
In industry deep learning application, our manually labeled data has a certain number of noisy data.
We present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling.
arXiv Detail & Related papers (2023-02-09T01:09:57Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - Boosting Facial Expression Recognition by A Semi-Supervised Progressive
Teacher [54.50747989860957]
We propose a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training.
Experiments on widely-used databases RAF-DB and FERPlus validate the effectiveness of our method, which achieves state-of-the-art performance with accuracy of 89.57% on RAF-DB.
arXiv Detail & Related papers (2022-05-28T07:47:53Z) - Instance Correction for Learning with Open-set Noisy Labels [145.06552420999986]
We use the sample selection approach to handle open-set noisy labels.
The discarded data are seen to be mislabeled and do not participate in training.
We modify the instances of discarded data to make predictions for the discarded data consistent with given labels.
arXiv Detail & Related papers (2021-06-01T13:05:55Z) - Semi-supervised learning by selective training with pseudo labels via
confidence estimation [0.0]
We propose a novel semi-supervised learning (SSL) method that adopts selective training with pseudo labels.
In our method, we generate hard pseudo-labels and also estimate their confidence, which represents how likely each pseudo-label is to be correct.
We also propose a new data augmentation method, called MixConf, that enables us to obtain confidence-calibrated models even when the number of training data is small.
arXiv Detail & Related papers (2021-03-15T08:00:33Z) - A Novel Perspective for Positive-Unlabeled Learning via Noisy Labels [49.990938653249415]
This research presents a methodology that assigns initial pseudo-labels to unlabeled data which is used as noisy-labeled data, and trains a deep neural network using the noisy-labeled data.
Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-03-08T11:46:02Z) - Self-training For Pre-training Language Models [0.5139874302398955]
In industry NLP applications, we have large amounts of data produced by users or customers.
Our learning framework is based on this large amounts of unlabel data.
arXiv Detail & Related papers (2020-11-18T01:35:01Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.