PAC Learning Linear Thresholds from Label Proportions
- URL: http://arxiv.org/abs/2310.10098v1
- Date: Mon, 16 Oct 2023 05:59:34 GMT
- Title: PAC Learning Linear Thresholds from Label Proportions
- Authors: Anand Brahmbhatt, Rishi Saket and Aravindan Raghuveer
- Abstract summary: Learning from label proportions (LLP) is a generalization of supervised learning.
We show that it is possible to efficiently learn LTFs using LTFs when given access to random bags of some label proportion.
We include an experimental evaluation of our learning algorithms along with a comparison with those of [Saket'21, Saket'22] and random LTFs.
- Score: 13.58949814915442
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from label proportions (LLP) is a generalization of supervised
learning in which the training data is available as sets or bags of
feature-vectors (instances) along with the average instance-label of each bag.
The goal is to train a good instance classifier. While most previous works on
LLP have focused on training models on such training data, computational
learnability of LLP was only recently explored by [Saket'21, Saket'22] who
showed worst case intractability of properly learning linear threshold
functions (LTFs) from label proportions. However, their work did not rule out
efficient algorithms for this problem on natural distributions.
In this work we show that it is indeed possible to efficiently learn LTFs
using LTFs when given access to random bags of some label proportion in which
feature-vectors are, conditioned on their labels, independently sampled from a
Gaussian distribution $N(\mathbf{\mu}, \mathbf{\Sigma})$. Our work shows that a
certain matrix -- formed using covariances of the differences of
feature-vectors sampled from the bags with and without replacement --
necessarily has its principal component, after a transformation, in the
direction of the normal vector of the LTF. Our algorithm estimates the means
and covariance matrices using subgaussian concentration bounds which we show
can be applied to efficiently sample bags for approximating the normal
direction. Using this in conjunction with novel generalization error bounds in
the bag setting, we show that a low error hypothesis LTF can be identified. For
some special cases of the $N(\mathbf{0}, \mathbf{I})$ distribution we provide a
simpler mean estimation based algorithm. We include an experimental evaluation
of our learning algorithms along with a comparison with those of [Saket'21,
Saket'22] and random LTFs, demonstrating the effectiveness of our techniques.
Related papers
- Enhancing Learning with Label Differential Privacy by Vector Approximation [12.212865127830872]
Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public.
Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label.
We propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead.
arXiv Detail & Related papers (2024-05-24T02:08:45Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Easy Learning from Label Proportions [17.71834385754893]
Easyllp is a flexible and simple-to-implement debiasing approach based on aggregate labels.
Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level.
arXiv Detail & Related papers (2023-02-06T20:41:38Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - TabMixer: Excavating Label Distribution Learning with Small-scale
Features [10.498049147922258]
Label distribution learning (LDL) differs from multi-label learning which aims at representing the polysemy of instances by transforming single-label values into descriptive degrees.
Unfortunately, the feature space of the label distribution dataset is affected by human factors and the inductive bias of the feature extractor causing uncertainty in the feature space.
We model the uncertainty augmentation of the feature space to alleviate the problem in LDL tasks.
Our proposed algorithm can be competitive compared to other LDL algorithms on several benchmarks.
arXiv Detail & Related papers (2022-10-25T09:18:15Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Fast learning from label proportions with small bags [0.0]
In learning from label proportions (LLP), the instances are grouped into bags, and the task is to learn an instance classifier given relative class proportions in training bags.
In this work, we focus on the case of small bags, which allows designing more efficient algorithms by explicitly considering all consistent label combinations.
arXiv Detail & Related papers (2021-10-07T13:11:18Z) - Coping with Label Shift via Distributionally Robust Optimisation [72.80971421083937]
We propose a model that minimises an objective based on distributionally robust optimisation (DRO)
We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective.
arXiv Detail & Related papers (2020-10-23T08:33:04Z) - Rethinking Curriculum Learning with Incremental Labels and Adaptive
Compensation [35.593312267921256]
Like humans, deep networks have been shown to learn better when samples are organized and introduced in a meaningful order or curriculum.
We propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels.
arXiv Detail & Related papers (2020-01-13T21:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.