Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective
- URL: http://arxiv.org/abs/2407.06902v1
- Date: Tue, 9 Jul 2024 14:34:40 GMT
- Title: Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective
- Authors: Shahana Ibrahim, Panagiotis A. Traganitis, Xiao Fu, Georgios B. Giannakis,
- Abstract summary: This feature article introduces advances in learning from noisy crowdsourced labels.
The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches.
In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization.
- Score: 42.24248330317496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data are dispatched to multiple annotators. The annotator-produced labels are then fused to serve downstream learning and inference tasks. This annotation process often creates noisy labels due to various reasons, such as the limited expertise, or unreliability of annotators, among others. Therefore, a core objective in crowdsourcing is to develop methods that effectively mitigate the negative impact of such label noise on learning tasks. This feature article introduces advances in learning from noisy crowdsourced labels. The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches, emphasizing analytical insights and algorithmic developments. In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization, and novel, principled solutions of longstanding challenges in crowdsourcing -- showing how SP perspectives drive the advancements of this field. Furthermore, this article touches upon emerging topics that are critical for developing cutting-edge AI/ML systems, such as crowdsourcing in reinforcement learning with human feedback (RLHF) and direct preference optimization (DPO) that are key techniques for fine-tuning large language models (LLMs).
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning [19.962212551963383]
Active Learning (AL) allows models to learn interactively from user feedback.
This paper introduces a counterfactual data augmentation approach to AL.
arXiv Detail & Related papers (2024-08-07T14:55:04Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Noise-Robust Fine-Tuning of Pretrained Language Models via External
Guidance [61.809732058101304]
We introduce an innovative approach for fine-tuning PLMs using noisy labels.
This approach incorporates the guidance of Large Language Models (LLMs) like ChatGPT.
This guidance assists in accurately distinguishing between clean and noisy samples.
arXiv Detail & Related papers (2023-11-02T09:20:38Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - A Topical Approach to Capturing Customer Insight In Social Media [0.0]
This research addresses the challenge of fully unsupervised topic extraction in noisy, Big Data contexts.
We present three approaches we built on the Variational Autoencoder framework.
We show that our models achieve equal to better performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-07-14T11:15:28Z) - Learning with Noisy Labels through Learnable Weighting and Centroid Similarity [5.187216033152917]
noisy labels are prevalent in domains such as medical diagnosis and autonomous driving.
We introduce a novel method for training machine learning models in the presence of noisy labels.
Our results show that our method consistently outperforms the existing state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-16T16:43:24Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only.
We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z) - Improving Classification through Weak Supervision in Context-specific
Conversational Agent Development for Teacher Education [1.215785021723604]
The effort required to develop an educational scenario specific conversational agent is time consuming.
Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes.
We propose using a multi-task weak supervision method combined with active learning to address these concerns.
arXiv Detail & Related papers (2020-10-23T23:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.