MixBag: Bag-Level Data Augmentation for Learning from Label Proportions
- URL: http://arxiv.org/abs/2308.08822v1
- Date: Thu, 17 Aug 2023 07:06:50 GMT
- Title: MixBag: Bag-Level Data Augmentation for Learning from Label Proportions
- Authors: Takanori Asanomi, Shinnosuke Matsuo, Daiki Suehiro, Ryoma Bise
- Abstract summary: Learning from label proportions (LLP) is a promising weakly supervised learning problem.
We propose a bag-level data augmentation method for LLP called MixBag.
- Score: 4.588028371034407
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning from label proportions (LLP) is a promising weakly supervised
learning problem. In LLP, a set of instances (bag) has label proportions, but
no instance-level labels are given. LLP aims to train an instance-level
classifier by using the label proportions of the bag. In this paper, we propose
a bag-level data augmentation method for LLP called MixBag, based on the key
observation from our preliminary experiments; that the instance-level
classification accuracy improves as the number of labeled bags increases even
though the total number of instances is fixed. We also propose a confidence
interval loss designed based on statistical theory to use the augmented bags
effectively. To the best of our knowledge, this is the first attempt to propose
bag-level data augmentation for LLP. The advantage of MixBag is that it can be
applied to instance-level data augmentation techniques and any LLP method that
uses the proportion loss. Experimental results demonstrate this advantage and
the effectiveness of our method.
Related papers
- Learning from Label Proportions and Covariate-shifted Instances [12.066922664696445]
In learning from label proportions (LLP) the aggregate label is the average of the instance-labels in a bag.
We develop methods for hybrid LLP which naturally incorporate the target bag-labels along with the source instance-labels.
arXiv Detail & Related papers (2024-11-19T08:36:34Z) - Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions [17.36538357653019]
Learning from label proportions (LLP) aims to train a classifier by using bags of instances and the proportions of classes within bags, rather than annotated labels for each instance.
We propose a novel LLP method, namely Learning from Label Proportions with Auxiliary High-confident Instance-level Loss (L2P-AHIL)
We show that L2P-AHIL can surpass the existing baseline methods, and the performance gain can be more significant as the bag size increases.
arXiv Detail & Related papers (2024-11-15T17:14:18Z) - Weak to Strong Learning from Aggregate Labels [9.804335415337071]
We study the problem of using a weak learner on such training bags with aggregate labels to obtain a strong learner.
A weak learner has at a constant accuracy 1 on the training bags, while a strong learner's accuracy can be arbitrarily close to 1.
Our work is the first to theoretically study weak to strong learning from aggregate labels, with an algorithm to achieve the same for LLP.
arXiv Detail & Related papers (2024-11-09T14:56:09Z) - Theoretical Proportion Label Perturbation for Learning from Label Proportions in Large Bags [5.842419815638353]
Learning from label proportions (LLP) is a weakly supervised learning that trains an instance-level classifier from label proportions of bags.
A challenge in LLP arises when the number of instances in a bag (bag size) is numerous, making the traditional LLP methods difficult due to GPU memory limitations.
This study aims to develop an LLP method capable of learning from bags with large sizes.
arXiv Detail & Related papers (2024-08-26T09:24:36Z) - Disambiguated Attention Embedding for Multi-Instance Partial-Label
Learning [68.56193228008466]
In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set.
Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels.
We propose an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning.
arXiv Detail & Related papers (2023-05-26T13:25:17Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - L2B: Learning to Bootstrap Robust Models for Combating Label Noise [52.02335367411447]
This paper introduces a simple and effective method, named Learning to Bootstrap (L2B)
It enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels.
It achieves this by dynamically adjusting the importance weight between real observed and generated labels, as well as between different samples through meta-learning.
arXiv Detail & Related papers (2022-02-09T05:57:08Z) - Instance-Dependent Partial Label Learning [69.49681837908511]
Partial label learning is a typical weakly supervised learning problem.
Most existing approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels.
In this paper, we consider instance-dependent and assume that each example is associated with a latent label distribution constituted by the real number of each label.
arXiv Detail & Related papers (2021-10-25T12:50:26Z) - Fast learning from label proportions with small bags [0.0]
In learning from label proportions (LLP), the instances are grouped into bags, and the task is to learn an instance classifier given relative class proportions in training bags.
In this work, we focus on the case of small bags, which allows designing more efficient algorithms by explicitly considering all consistent label combinations.
arXiv Detail & Related papers (2021-10-07T13:11:18Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Are Fewer Labels Possible for Few-shot Learning? [81.89996465197392]
Few-shot learning is challenging due to its very limited data and labels.
Recent studies in big transfer (BiT) show that few-shot learning can greatly benefit from pretraining on large scale labeled dataset in a different domain.
We propose eigen-finetuning to enable fewer shot learning by leveraging the co-evolution of clustering and eigen-samples in the finetuning.
arXiv Detail & Related papers (2020-12-10T18:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.