Robust Product Classification with Instance-Dependent Noise
- URL: http://arxiv.org/abs/2209.06946v1
- Date: Wed, 14 Sep 2022 21:45:14 GMT
- Title: Robust Product Classification with Instance-Dependent Noise
- Authors: Huy Nguyen and Devashish Khatwani
- Abstract summary: Noisy labels in large E-commerce product data (i.e., product items are placed into incorrect categories) are a critical issue for product categorization task.
We study the impact of instance-dependent noise to performance of product title classification.
- Score: 2.0661025590877777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Noisy labels in large E-commerce product data (i.e., product items are placed
into incorrect categories) are a critical issue for product categorization task
because they are unavoidable, non-trivial to remove and degrade prediction
performance significantly. Training a product title classification model which
is robust to noisy labels in the data is very important to make product
classification applications more practical. In this paper, we study the impact
of instance-dependent noise to performance of product title classification by
comparing our data denoising algorithm and different noise-resistance training
algorithms which were designed to prevent a classifier model from over-fitting
to noise. We develop a simple yet effective Deep Neural Network for product
title classification to use as a base classifier. Along with recent methods of
stimulating instance-dependent noise, we propose a novel noise stimulation
algorithm based on product title similarity. Our experiments cover multiple
datasets, various noise methods and different training solutions. Results
uncover the limit of classification task when noise rate is not negligible and
data distribution is highly skewed.
Related papers
- Training Gradient Boosted Decision Trees on Tabular Data Containing Label Noise for Classification Tasks [1.261491746208123]
This study aims to investigate the effects of label noise on gradient-boosted decision trees and methods to mitigate those effects.
The implemented methods demonstrate state-of-the-art noise detection performance on the Adult dataset and achieve the highest classification precision and recall on the Adult and Breast Cancer datasets.
arXiv Detail & Related papers (2024-09-13T09:09:24Z) - NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns.
We constructed a benchmark dataset to better understand label noise in real-world text classification settings.
Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures [15.358504449550013]
We design algorithms to learn from noisy labels for two broad classes of non-decomposable performance measures.
In both cases, we develop noise-corrected versions of the algorithms under the widely studied class-conditional noise models.
Our experiments demonstrate the effectiveness of our algorithms in handling label noise.
arXiv Detail & Related papers (2024-02-01T23:03:53Z) - Rethinking the Value of Labels for Instance-Dependent Label Noise
Learning [43.481591776038144]
noisy labels in real-world applications often depend on both the true label and the features.
In this work, we tackle instance-dependent label noise with a novel deep generative model that avoids explicitly modeling the noise transition matrix.
Our algorithm leverages casual representation learning and simultaneously identifies the high-level content and style latent factors from the data.
arXiv Detail & Related papers (2023-05-10T15:29:07Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision.
The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr.
This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z) - Multi-Class Classification from Noisy-Similarity-Labeled Data [98.13491369929798]
We propose a method for learning from only noisy-similarity-labeled data.
We use a noise transition matrix to bridge the class-posterior probability between clean and noisy data.
We build a novel learning system which can assign noise-free class labels for instances.
arXiv Detail & Related papers (2020-02-16T05:10:21Z) - Particle Competition and Cooperation for Semi-Supervised Learning with
Label Noise [6.247917165799351]
A graph-based semi-supervised learning approach based on Particle competition and cooperation was developed.
This paper presents a new particle competition and cooperation algorithm, specifically designed to increase the robustness to the presence of label noise.
It performs classification of unlabeled nodes and reclassification of the nodes affected by label noise in a unique process.
arXiv Detail & Related papers (2020-02-12T19:44:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.