Aggregating Data for Optimal and Private Learning
- URL: http://arxiv.org/abs/2411.19045v1
- Date: Thu, 28 Nov 2024 10:44:00 GMT
- Title: Aggregating Data for Optimal and Private Learning
- Authors: Sushant Agarwal, Yukti Makhija, Rishi Saket, Aravindan Raghuveer,
- Abstract summary: Multiple Instance Regression (MIR) and Learning from Label Proportions (LLP) are learning frameworks.
We study for various loss functions in MIR and LLP, what is the optimal way to partition the dataset into bags.
- Score: 13.283323029489507
- License:
- Abstract: Multiple Instance Regression (MIR) and Learning from Label Proportions (LLP) are learning frameworks arising in many applications, where the training data is partitioned into disjoint sets or bags, and only an aggregate label i.e., bag-label for each bag is available to the learner. In the case of MIR, the bag-label is the label of an undisclosed instance from the bag, while in LLP, the bag-label is the mean of the bag's labels. In this paper, we study for various loss functions in MIR and LLP, what is the optimal way to partition the dataset into bags such that the utility for downstream tasks like linear regression is maximized. We theoretically provide utility guarantees, and show that in each case, the optimal bagging strategy (approximately) reduces to finding an optimal clustering of the feature vectors or the labels with respect to natural objectives such as $k$-means. We also show that our bagging mechanisms can be made label-differentially private, incurring an additional utility error. We then generalize our results to the setting of Generalized Linear Models (GLMs). Finally, we experimentally validate our theoretical results.
Related papers
- Learning from Label Proportions and Covariate-shifted Instances [12.066922664696445]
In learning from label proportions (LLP) the aggregate label is the average of the instance-labels in a bag.
We develop methods for hybrid LLP which naturally incorporate the target bag-labels along with the source instance-labels.
arXiv Detail & Related papers (2024-11-19T08:36:34Z) - Weak to Strong Learning from Aggregate Labels [9.804335415337071]
We study the problem of using a weak learner on such training bags with aggregate labels to obtain a strong learner.
A weak learner has at a constant accuracy 1 on the training bags, while a strong learner's accuracy can be arbitrarily close to 1.
Our work is the first to theoretically study weak to strong learning from aggregate labels, with an algorithm to achieve the same for LLP.
arXiv Detail & Related papers (2024-11-09T14:56:09Z) - Disambiguated Attention Embedding for Multi-Instance Partial-Label
Learning [68.56193228008466]
In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set.
Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels.
We propose an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning.
arXiv Detail & Related papers (2023-05-26T13:25:17Z) - Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact
Supervision [53.530957567507365]
In some real-world tasks, each training sample is associated with a candidate label set that contains one ground-truth label and some false positive labels.
In this paper, we formalize such problems as multi-instance partial-label learning (MIPL)
Existing multi-instance learning algorithms and partial-label learning algorithms are suboptimal for solving MIPL problems.
arXiv Detail & Related papers (2022-12-18T03:28:51Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - How to Leverage Unlabeled Data in Offline Reinforcement Learning [125.72601809192365]
offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition.
One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data.
We find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing.
arXiv Detail & Related papers (2022-02-03T18:04:54Z) - Instance-Dependent Partial Label Learning [69.49681837908511]
Partial label learning is a typical weakly supervised learning problem.
Most existing approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels.
In this paper, we consider instance-dependent and assume that each example is associated with a latent label distribution constituted by the real number of each label.
arXiv Detail & Related papers (2021-10-25T12:50:26Z) - Fast learning from label proportions with small bags [0.0]
In learning from label proportions (LLP), the instances are grouped into bags, and the task is to learn an instance classifier given relative class proportions in training bags.
In this work, we focus on the case of small bags, which allows designing more efficient algorithms by explicitly considering all consistent label combinations.
arXiv Detail & Related papers (2021-10-07T13:11:18Z) - Active Learning in Incomplete Label Multiple Instance Multiple Label
Learning [17.5720245903743]
We propose a novel bag-class pair based approach for active learning in the MIML setting.
Our approach is based on a discriminative graphical model with efficient and exact inference.
arXiv Detail & Related papers (2021-07-22T17:01:28Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.