Related papers: PAC Learning Linear Thresholds from Label Proportions

PAC Learning Linear Thresholds from Label Proportions

URL: http://arxiv.org/abs/2310.10098v1
Date: Mon, 16 Oct 2023 05:59:34 GMT
Title: PAC Learning Linear Thresholds from Label Proportions
Authors: Anand Brahmbhatt, Rishi Saket and Aravindan Raghuveer
Abstract summary: Learning from label proportions (LLP) is a generalization of supervised learning. We show that it is possible to efficiently learn LTFs using LTFs when given access to random bags of some label proportion. We include an experimental evaluation of our learning algorithms along with a comparison with those of [Saket'21, Saket'22] and random LTFs.
Score: 13.58949814915442
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning from label proportions (LLP) is a generalization of supervised learning in which the training data is available as sets or bags of feature-vectors (instances) along with the average instance-label of each bag. The goal is to train a good instance classifier. While most previous works on LLP have focused on training models on such training data, computational learnability of LLP was only recently explored by [Saket'21, Saket'22] who showed worst case intractability of properly learning linear threshold functions (LTFs) from label proportions. However, their work did not rule out efficient algorithms for this problem on natural distributions. In this work we show that it is indeed possible to efficiently learn LTFs using LTFs when given access to random bags of some label proportion in which feature-vectors are, conditioned on their labels, independently sampled from a Gaussian distribution $N(\mathbf{\mu}, \mathbf{\Sigma})$. Our work shows that a certain matrix -- formed using covariances of the differences of feature-vectors sampled from the bags with and without replacement -- necessarily has its principal component, after a transformation, in the direction of the normal vector of the LTF. Our algorithm estimates the means and covariance matrices using subgaussian concentration bounds which we show can be applied to efficiently sample bags for approximating the normal direction. Using this in conjunction with novel generalization error bounds in the bag setting, we show that a low error hypothesis LTF can be identified. For some special cases of the $N(\mathbf{0}, \mathbf{I})$ distribution we provide a simpler mean estimation based algorithm. We include an experimental evaluation of our learning algorithms along with a comparison with those of [Saket'21, Saket'22] and random LTFs, demonstrating the effectiveness of our techniques.

Related papers

Learning with Positive and Imperfect Unlabeled Data [7.04316974339151]
We study the problem of learning binary classifiers from positive and unlabeled data when the unlabeled data distribution is shifted. Our main results on PIU learning are the characterizations of the sample complexity of PIU learning and a computationally and sample-efficient algorithm achieving a misclassification error.
arXiv Detail & Related papers (2025-04-14T17:19:29Z)
Aggregating Data for Optimal and Private Learning [13.283323029489507]
Multiple Instance Regression (MIR) and Learning from Label Proportions (LLP) are learning frameworks. We study for various loss functions in MIR and LLP, what is the optimal way to partition the dataset into bags.
arXiv Detail & Related papers (2024-11-28T10:44:00Z)
Weak to Strong Learning from Aggregate Labels [9.804335415337071]
We study the problem of using a weak learner on such training bags with aggregate labels to obtain a strong learner. A weak learner has at a constant accuracy 1 on the training bags, while a strong learner's accuracy can be arbitrarily close to 1. Our work is the first to theoretically study weak to strong learning from aggregate labels, with an algorithm to achieve the same for LLP.
arXiv Detail & Related papers (2024-11-09T14:56:09Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Easy Learning from Label Proportions [17.71834385754893]
Easyllp is a flexible and simple-to-implement debiasing approach based on aggregate labels. Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level.
arXiv Detail & Related papers (2023-02-06T20:41:38Z)
Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. $nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z)
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions. Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z)
TabMixer: Excavating Label Distribution Learning with Small-scale Features [10.498049147922258]
Label distribution learning (LDL) differs from multi-label learning which aims at representing the polysemy of instances by transforming single-label values into descriptive degrees. Unfortunately, the feature space of the label distribution dataset is affected by human factors and the inductive bias of the feature extractor causing uncertainty in the feature space. We model the uncertainty augmentation of the feature space to alleviate the problem in LDL tasks. Our proposed algorithm can be competitive compared to other LDL algorithms on several benchmarks.
arXiv Detail & Related papers (2022-10-25T09:18:15Z)
Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples. A typical alternative is learning from multiple noisy annotators. This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z)
Fast learning from label proportions with small bags [0.0]
In learning from label proportions (LLP), the instances are grouped into bags, and the task is to learn an instance classifier given relative class proportions in training bags. In this work, we focus on the case of small bags, which allows designing more efficient algorithms by explicitly considering all consistent label combinations.
arXiv Detail & Related papers (2021-10-07T13:11:18Z)
Coping with Label Shift via Distributionally Robust Optimisation [72.80971421083937]
We propose a model that minimises an objective based on distributionally robust optimisation (DRO) We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective.
arXiv Detail & Related papers (2020-10-23T08:33:04Z)
Rethinking Curriculum Learning with Incremental Labels and Adaptive Compensation [35.593312267921256]
Like humans, deep networks have been shown to learn better when samples are organized and introduced in a meaningful order or curriculum. We propose Learning with Incremental Labels and Adaptive Compensation (LILAC), a two-phase method that incrementally increases the number of unique output labels.
arXiv Detail & Related papers (2020-01-13T21:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.