Multi-Label Sampling based on Local Label Imbalance
- URL: http://arxiv.org/abs/2005.03240v2
- Date: Tue, 19 May 2020 10:53:43 GMT
- Title: Multi-Label Sampling based on Local Label Imbalance
- Authors: Bin Liu, Konstantinos Blekas, and Grigorios Tsoumakas
- Abstract summary: Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods.
Existing multi-label sampling approaches alleviate the global imbalance of multi-label datasets.
It is actually the imbalance level within the local neighbourhood of minority class examples that plays a key role in performance degradation.
- Score: 7.355362369511579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Class imbalance is an inherent characteristic of multi-label data that
hinders most multi-label learning methods. One efficient and flexible strategy
to deal with this problem is to employ sampling techniques before training a
multi-label learning model. Although existing multi-label sampling approaches
alleviate the global imbalance of multi-label datasets, it is actually the
imbalance level within the local neighbourhood of minority class examples that
plays a key role in performance degradation. To address this issue, we propose
a novel measure to assess the local label imbalance of multi-label datasets, as
well as two multi-label sampling approaches based on the local label imbalance,
namely MLSOL and MLUL. By considering all informative labels, MLSOL creates
more diverse and better labeled synthetic instances for difficult examples,
while MLUL eliminates instances that are harmful to their local region.
Experimental results on 13 multi-label datasets demonstrate the effectiveness
of the proposed measure and sampling approaches for a variety of evaluation
metrics, particularly in the case of an ensemble of classifiers trained on
repeated samples of the original data.
Related papers
- Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.
Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.
We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z) - Toward Robustness in Multi-label Classification: A Data Augmentation
Strategy against Imbalance and Noise [31.917931364881625]
Multi-label classification poses challenges due to imbalanced and noisy labels in training data.
We propose a unified data augmentation method, named BalanceMix, to address these challenges.
Our approach includes two samplers for imbalanced labels, generating minority-augmented instances with high diversity.
arXiv Detail & Related papers (2023-12-12T09:09:45Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Deep Partial Multi-Label Learning with Graph Disambiguation [27.908565535292723]
We propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN)
Specifically, we introduce the instance-level and label-level similarities to recover label confidences.
At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels.
arXiv Detail & Related papers (2023-05-10T04:02:08Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - An Effective Approach for Multi-label Classification with Missing Labels [8.470008570115146]
We propose a pseudo-label based approach to reduce the cost of annotation without bringing additional complexity to the classification networks.
By designing a novel loss function, we are able to relax the requirement that each instance must contain at least one positive label.
We show that our method can handle the imbalance between positive labels and negative labels, while still outperforming existing missing-label learning approaches.
arXiv Detail & Related papers (2022-10-24T23:13:57Z) - PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label
Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training.
We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z) - One Positive Label is Sufficient: Single-Positive Multi-Label Learning
with Label Enhancement [71.9401831465908]
We investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label.
A novel method named proposed, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed.
Experiments on benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-06-01T14:26:30Z) - Integrating Unsupervised Clustering and Label-specific Oversampling to
Tackle Imbalanced Multi-label Data [13.888344214818733]
Clustering is performed to find out the key distinct and locally connected regions of a multi-label dataset.
Only the minority points within a cluster are used to generate the synthetic minority points that are used for oversampling.
Experiments using 12 multi-label datasets and several multi-label algorithms show that the proposed method performed very well.
arXiv Detail & Related papers (2021-09-25T19:00:00Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.