Optimal Learning from Label Proportions with General Loss Functions
- URL: http://arxiv.org/abs/2509.15145v1
- Date: Thu, 18 Sep 2025 16:53:32 GMT
- Title: Optimal Learning from Label Proportions with General Loss Functions
- Authors: Lorne Applebaum, Travis Dick, Claudio Gentile, Haim Kaplan, Tomer Koren,
- Abstract summary: We introduce a novel and versatile low-variance de-biasing methodology to learn from aggregate label information.<n>Our approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions.<n>We empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets.
- Score: 33.827617632719864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by problems in online advertising, we address the task of Learning from Label Proportions (LLP). In this partially-supervised setting, training data consists of groups of examples, termed bags, for which we only observe the average label value. The main goal, however, remains the design of a predictor for the labels of individual examples. We introduce a novel and versatile low-variance de-biasing methodology to learn from aggregate label information, significantly advancing the state of the art in LLP. Our approach exhibits remarkable flexibility, seamlessly accommodating a broad spectrum of practically relevant loss functions across both binary and multi-class classification settings. By carefully combining our estimators with standard techniques, we substantially improve sample complexity guarantees for a large class of losses of practical relevance. We also empirically validate the efficacy of our proposed approach across a diverse array of benchmark datasets, demonstrating compelling empirical advantages over standard baselines.
Related papers
- Rethinking Consistent Multi-Label Classification under Inexact Supervision [60.79309683889278]
In partial multi-label learning, each instance is annotated with a candidate label set, among which only some labels are relevant.<n>In complementary multi-label learning, each instance is annotated with complementary labels indicating the classes to which the instance does not belong.
arXiv Detail & Related papers (2025-10-05T08:30:32Z) - Multi-Label Contrastive Learning : A Comprehensive Study [48.81069245141415]
Multi-label classification has emerged as a key area in both research and industry.<n>Applying contrastive learning to multi-label classification presents unique challenges.<n>We conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings.
arXiv Detail & Related papers (2024-11-27T20:20:06Z) - Class-aware and Augmentation-free Contrastive Learning from Label Proportion [19.41511190742059]
Learning from Label Proportion (LLP) is a weakly supervised learning scenario in which training data is organized into predefined bags of instances.
We propose an augmentation-free contrastive framework TabLLP-BDC that introduces class-aware supervision at the instance level.
Our solution features a two-stage Bag Difference Contrastive (BDC) learning mechanism that establishes robust class-aware instance-level supervision.
arXiv Detail & Related papers (2024-08-13T09:04:47Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Easy Learning from Label Proportions [17.71834385754893]
Easyllp is a flexible and simple-to-implement debiasing approach based on aggregate labels.
Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level.
arXiv Detail & Related papers (2023-02-06T20:41:38Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Controller-Guided Partial Label Consistency Regularization with
Unlabeled Data [49.24911720809604]
We propose a controller-guided consistency regularization at both the label-level and representation-level.
We dynamically adjust the confidence thresholds so that the number of samples of each class participating in consistency regularization remains roughly equal to alleviate the problem of class-imbalance.
arXiv Detail & Related papers (2022-10-20T12:15:13Z) - A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation [42.0958430465578]
We study the partial multi-label (PML) image classification problem.
Existing PML methods typically design a disambiguation strategy to filter out noisy labels.
We propose a deep model for PML to enhance the representation and discrimination ability.
arXiv Detail & Related papers (2022-07-06T02:49:02Z) - Learning from Label Proportions by Learning with Label Noise [30.7933303912474]
Learning from label proportions (LLP) is a weakly supervised classification problem where data points are grouped into bags.
We provide a theoretically grounded approach to LLP based on a reduction to learning with label noise.
Our approach demonstrates improved empirical performance in deep learning scenarios across multiple datasets and architectures.
arXiv Detail & Related papers (2022-03-04T18:52:21Z) - Structured Prediction with Partial Labelling through the Infimum Loss [85.4940853372503]
The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect.
This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one.
This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling.
arXiv Detail & Related papers (2020-03-02T13:59:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.