Related papers: Learning Multiclass Classifier Under Noisy Bandit Feedback

Learning Multiclass Classifier Under Noisy Bandit Feedback

URL: http://arxiv.org/abs/2006.03545v2
Date: Wed, 3 Mar 2021 16:56:12 GMT
Title: Learning Multiclass Classifier Under Noisy Bandit Feedback
Authors: Mudit Agarwal and Naresh Manwani
Abstract summary: We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We show our approach's effectiveness using extensive experiments on several benchmark datasets.
Score: 6.624726878647541
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner may not receive true feedback. Instead, it receives feedback that has been flipped with some non-zero probability. We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We further offer a method that can efficiently estimate the noise rates, thus providing an end-to-end framework. The proposed algorithm enjoys a mistake bound of the order of $O(\sqrt{T})$ in the high noise case and of the order of $O(T^{\nicefrac{2}{3}})$ in the worst case. We show our approach's effectiveness using extensive experiments on several benchmark datasets.

Related papers

Enhance Vision-Language Alignment with Noise [59.2608298578913]
We investigate whether the frozen model can be fine-tuned by customized noise. We propose Positive-incentive Noise (PiNI) which can fine-tune CLIP via injecting noise into both visual and text encoders.
arXiv Detail & Related papers (2024-12-14T12:58:15Z)
Trusted Multi-view Learning with Label Noise [17.458306450909316]
Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty. We propose a trusted multi-view noise refining method to solve this problem. We empirically compare TMNR with state-of-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets.
arXiv Detail & Related papers (2024-04-18T06:47:30Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC) NPC consists of a detection module and a correction module. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z)
Label Noise: Correcting the Forward-Correction [0.0]
Training neural network classifiers on datasets with label noise poses a risk of overfitting them to the noisy labels. We propose an approach to tackling overfitting caused by label noise. Motivated by this observation, we propose imposing a lower bound on the training loss to mitigate overfitting.
arXiv Detail & Related papers (2023-07-24T19:41:19Z)
Certified Adversarial Robustness Within Multiple Perturbation Bounds [38.3813286696956]
Randomized smoothing (RS) is a well known certified defense against adversarial attacks. In this work, we aim to improve the certified adversarial robustness against multiple perturbation bounds simultaneously.
arXiv Detail & Related papers (2023-04-20T16:42:44Z)
RoLNiP: Robust Learning Using Noisy Pairwise Comparisons [6.624726878647541]
This paper presents a robust approach for learning from noisy pairwise comparisons. We experimentally show that the proposed approach RoLNiP outperforms the robust state-of-the-art methods for learning with noisy pairwise comparisons.
arXiv Detail & Related papers (2023-03-04T06:28:08Z)
UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise. We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z)
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs [27.636407641546914]
We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We show that the regret bound is near-optimal even with very heavy-tailed noise.
arXiv Detail & Related papers (2021-10-26T17:30:44Z)
Multiclass Classification using dilute bandit feedback [8.452237741722726]
We propose an algorithm for multiclass classification using dilute bandit feedback (MC-DBF) We show that the proposed algorithm achieves O(T1-frac1m+2) mistake bound if candidate label set size (in each step) is m.
arXiv Detail & Related papers (2021-05-17T18:05:34Z)
Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization [88.91872713134342]
We propose a theoretically grounded method that can estimate the noise transition matrix and learn a classifier simultaneously. We show the effectiveness of the proposed method through experiments on benchmark and real-world datasets.
arXiv Detail & Related papers (2021-02-04T05:09:18Z)
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)
Multi-label Contrastive Predictive Coding [125.03510235962095]
Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC) We introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $log m$ bound, while still being a valid lower bound of mutual information.
arXiv Detail & Related papers (2020-07-20T02:46:21Z)
Optimistic Policy Optimization with Bandit Feedback [70.75568142146493]
We propose an optimistic trust region policy optimization (TRPO) algorithm for which we establish $tilde O(sqrtS2 A H4 K)$ regret for previous rewards. To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.
arXiv Detail & Related papers (2020-02-19T15:41:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.