Theoretical Proportion Label Perturbation for Learning from Label Proportions in Large Bags
- URL: http://arxiv.org/abs/2408.14130v1
- Date: Mon, 26 Aug 2024 09:24:36 GMT
- Title: Theoretical Proportion Label Perturbation for Learning from Label Proportions in Large Bags
- Authors: Shunsuke Kubo, Shinnosuke Matsuo, Daiki Suehiro, Kazuhiro Terada, Hiroaki Ito, Akihiko Yoshizawa, Ryoma Bise,
- Abstract summary: Learning from label proportions (LLP) is a weakly supervised learning that trains an instance-level classifier from label proportions of bags.
A challenge in LLP arises when the number of instances in a bag (bag size) is numerous, making the traditional LLP methods difficult due to GPU memory limitations.
This study aims to develop an LLP method capable of learning from bags with large sizes.
- Score: 5.842419815638353
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning from label proportions (LLP) is a kind of weakly supervised learning that trains an instance-level classifier from label proportions of bags, which consist of sets of instances without using instance labels. A challenge in LLP arises when the number of instances in a bag (bag size) is numerous, making the traditional LLP methods difficult due to GPU memory limitations. This study aims to develop an LLP method capable of learning from bags with large sizes. In our method, smaller bags (mini-bags) are generated by sampling instances from large-sized bags (original bags), and these mini-bags are used in place of the original bags. However, the proportion of a mini-bag is unknown and differs from that of the original bag, leading to overfitting. To address this issue, we propose a perturbation method for the proportion labels of sampled mini-bags to mitigate overfitting to noisy label proportions. This perturbation is added based on the multivariate hypergeometric distribution, which is statistically modeled. Additionally, loss weighting is implemented to reduce the negative impact of proportions sampled from the tail of the distribution. Experimental results demonstrate that the proportion label perturbation and loss weighting achieve classification accuracy comparable to that obtained without sampling. Our codes are available at https://github.com/stainlessnight/LLP-LargeBags.
Related papers
- Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions [17.36538357653019]
Learning from label proportions (LLP) aims to train a classifier by using bags of instances and the proportions of classes within bags, rather than annotated labels for each instance.
We propose a novel LLP method, namely Learning from Label Proportions with Auxiliary High-confident Instance-level Loss (L2P-AHIL)
We show that L2P-AHIL can surpass the existing baseline methods, and the performance gain can be more significant as the bag size increases.
arXiv Detail & Related papers (2024-11-15T17:14:18Z) - Weak to Strong Learning from Aggregate Labels [9.804335415337071]
We study the problem of using a weak learner on such training bags with aggregate labels to obtain a strong learner.
A weak learner has at a constant accuracy 1 on the training bags, while a strong learner's accuracy can be arbitrarily close to 1.
Our work is the first to theoretically study weak to strong learning from aggregate labels, with an algorithm to achieve the same for LLP.
arXiv Detail & Related papers (2024-11-09T14:56:09Z) - MixBag: Bag-Level Data Augmentation for Learning from Label Proportions [4.588028371034407]
Learning from label proportions (LLP) is a promising weakly supervised learning problem.
We propose a bag-level data augmentation method for LLP called MixBag.
arXiv Detail & Related papers (2023-08-17T07:06:50Z) - Disambiguated Attention Embedding for Multi-Instance Partial-Label
Learning [68.56193228008466]
In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set.
Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels.
We propose an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning.
arXiv Detail & Related papers (2023-05-26T13:25:17Z) - Learning from Aggregated Data: Curated Bags versus Random Bags [35.394402088653415]
We explore the possibility of training machine learning models with aggregated data labels, rather than individual labels.
For the curated bag setting, we show that we can perform gradient-based learning without any degradation in performance.
In the random bag setting, there is a trade-off between size of the bag and the achievable error rate as our bound indicates.
arXiv Detail & Related papers (2023-05-16T15:53:45Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - An analysis of over-sampling labeled data in semi-supervised learning
with FixMatch [66.34968300128631]
Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches.
This paper studies whether this common practice improves learning and how.
We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not.
arXiv Detail & Related papers (2022-01-03T12:22:26Z) - Instance-Dependent Partial Label Learning [69.49681837908511]
Partial label learning is a typical weakly supervised learning problem.
Most existing approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels.
In this paper, we consider instance-dependent and assume that each example is associated with a latent label distribution constituted by the real number of each label.
arXiv Detail & Related papers (2021-10-25T12:50:26Z) - Fast learning from label proportions with small bags [0.0]
In learning from label proportions (LLP), the instances are grouped into bags, and the task is to learn an instance classifier given relative class proportions in training bags.
In this work, we focus on the case of small bags, which allows designing more efficient algorithms by explicitly considering all consistent label combinations.
arXiv Detail & Related papers (2021-10-07T13:11:18Z) - Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed
Recognition [95.93760490301395]
The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered.
It is hypothesized that this is due to the repeated sampling of examples and can be addressed by feature space augmentation.
A new feature augmentation strategy, EMANATE, based on back-tracking of features across epochs during training, is proposed.
A new sampling procedure, Breadcrumb, is then introduced to implement adversarial class-balanced sampling without extra computation.
arXiv Detail & Related papers (2021-05-01T00:21:26Z) - Weakly-Supervised Action Localization with Expectation-Maximization
Multi-Instance Learning [82.41415008107502]
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments)
We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions.
arXiv Detail & Related papers (2020-03-31T23:36:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.