Probably Approximately Precision and Recall Learning
- URL: http://arxiv.org/abs/2411.13029v2
- Date: Thu, 23 Oct 2025 21:57:21 GMT
- Title: Probably Approximately Precision and Recall Learning
- Authors: Lee Cohen, Yishay Mansour, Shay Moran, Han Shao,
- Abstract summary: A key challenge in machine learning is the prevalence of one-sided feedback.<n>We introduce a Probably Approximately Correct (PAC) framework in which hypotheses are set functions that map each input to a set of labels.<n>We develop new algorithms that learn from positive data alone, achieving optimal sample complexity in the realizable case.
- Score: 60.00180898830079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precision and Recall are fundamental metrics in machine learning tasks where both accurate predictions and comprehensive coverage are essential, such as in multi-label learning, language generation, medical studies, and recommender systems. A key challenge in these settings is the prevalence of one-sided feedback, where only positive examples are observed during training--e.g., in multi-label tasks like tagging people in Facebook photos, we may observe only a few tagged individuals, without knowing who else appears in the image. To address learning under such partial feedback, we introduce a Probably Approximately Correct (PAC) framework in which hypotheses are set functions that map each input to a set of labels, extending beyond single-label predictions and generalizing classical binary, multi-class, and multi-label models. Our results reveal sharp statistical and algorithmic separations from standard settings: classical methods such as Empirical Risk Minimization provably fail, even for simple hypothesis classes. We develop new algorithms that learn from positive data alone, achieving optimal sample complexity in the realizable case, and establishing multiplicative--rather than additive-approximation guarantees in the agnostic case, where achieving additive regret is impossible.
Related papers
- Fairness Without Harm: An Influence-Guided Active Sampling Approach [32.173195437797766]
We aim to train models that mitigate group fairness disparity without causing harm to model accuracy.
The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes.
We propose a tractable active data sampling algorithm that does not rely on training group annotations.
arXiv Detail & Related papers (2024-02-20T07:57:38Z) - Joint empirical risk minimization for instance-dependent
positive-unlabeled data [4.112909937203119]
Learning from positive and unlabeled data (PU learning) is actively researched machine learning task.
The goal is to train a binary classification model based on a dataset containing part on positives which are labeled, and unlabeled instances.
Unlabeled set includes remaining part positives and all negative observations.
arXiv Detail & Related papers (2023-12-27T12:45:12Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Can Class-Priors Help Single-Positive Multi-Label Learning? [40.312419865957224]
Single-positive multi-label learning (SPMLL) is a typical weakly supervised multi-label learning problem.
Class-priors estimator is introduced, which could estimate the class-priors that are theoretically guaranteed to converge to the ground-truth class-priors.
Based on the estimated class-priors, an unbiased risk estimator for classification is derived, and the corresponding risk minimizer could be guaranteed to approximately converge to the optimal risk minimizer on fully supervised data.
arXiv Detail & Related papers (2023-09-25T05:45:57Z) - Learnability, Sample Complexity, and Hypothesis Class Complexity for
Regression Models [10.66048003460524]
This work is inspired by the foundation of PAC and is motivated by the existing regression learning issues.
The proposed approach, denoted by epsilon-Confidence Approximately Correct (epsilon CoAC), utilizes Kullback Leibler divergence (relative entropy)
It enables the learner to compare hypothesis classes of different complexity orders and choose among them the optimum with the minimum epsilon.
arXiv Detail & Related papers (2023-03-28T15:59:12Z) - A Cross-Conformal Predictor for Multi-label Classification [0.0]
In multi-label learning each instance is associated with multiple classes simultaneously.
This work examines the application of a recently developed framework called Conformal Prediction to the multi-label learning setting.
arXiv Detail & Related papers (2022-11-29T14:21:49Z) - An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning [58.59343434538218]
We propose a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective.
Our approach can be implemented in just few lines of code by only using off-the-shelf operations.
arXiv Detail & Related papers (2022-09-28T02:11:34Z) - Improved Robust Algorithms for Learning with Discriminative Feature
Feedback [21.58493386054356]
Discriminative Feature Feedback is a protocol for interactive learning based on feature explanations that are provided by a human teacher.
We provide new robust interactive learning algorithms for the Discriminative Feature Feedback model.
arXiv Detail & Related papers (2022-09-08T12:11:12Z) - One Positive Label is Sufficient: Single-Positive Multi-Label Learning
with Label Enhancement [71.9401831465908]
We investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label.
A novel method named proposed, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed.
Experiments on benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-06-01T14:26:30Z) - Approximability and Generalisation [0.0]
We study the role of approximability in learning, both in the full precision and the approximated settings of the predictor.
We show that under mild conditions, approximable target concepts are learnable from a smaller labelled sample.
We give algorithms that guarantee a good predictor whose approximation also enjoys the same generalisation guarantees.
arXiv Detail & Related papers (2022-03-15T15:21:48Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Learning with Proper Partial Labels [87.65718705642819]
Partial-label learning is a kind of weakly-supervised learning with inexact labels.
We show that this proper partial-label learning framework includes many previous partial-label learning settings.
We then derive a unified unbiased estimator of the classification risk.
arXiv Detail & Related papers (2021-12-23T01:37:03Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - An Effective Baseline for Robustness to Distributional Shift [5.627346969563955]
Refraining from confidently predicting when faced with categories of inputs different from those seen during training is an important requirement for the safe deployment of deep learning systems.
We present a simple, but highly effective approach to deal with out-of-distribution detection that uses the principle of abstention.
arXiv Detail & Related papers (2021-05-15T00:46:11Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.