Learning from Aggregate responses: Instance Level versus Bag Level Loss
Functions
- URL: http://arxiv.org/abs/2401.11081v1
- Date: Sat, 20 Jan 2024 02:14:11 GMT
- Title: Learning from Aggregate responses: Instance Level versus Bag Level Loss
Functions
- Authors: Adel Javanmard, Lin Chen, Vahab Mirrokni, Ashwinkumar Badanidiyuru,
Gang Fu
- Abstract summary: In many practical applications the training data is aggregated before being shared with the learner, in order to protect privacy of users' sensitive responses.
We study two natural loss functions for learning from aggregate responses: bag-level loss and the instance-level loss.
We propose a mechanism for differentially private learning from aggregate responses and derive the optimal bag size in terms of prediction risk-privacy trade-off.
- Score: 23.32422115080128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the rise of privacy concerns, in many practical applications the
training data is aggregated before being shared with the learner, in order to
protect privacy of users' sensitive responses. In an aggregate learning
framework, the dataset is grouped into bags of samples, where each bag is
available only with an aggregate response, providing a summary of individuals'
responses in that bag. In this paper, we study two natural loss functions for
learning from aggregate responses: bag-level loss and the instance-level loss.
In the former, the model is learnt by minimizing a loss between aggregate
responses and aggregate model predictions, while in the latter the model aims
to fit individual predictions to the aggregate responses. In this work, we show
that the instance-level loss can be perceived as a regularized form of the
bag-level loss. This observation lets us compare the two approaches with
respect to bias and variance of the resulting estimators, and introduce a novel
interpolating estimator which combines the two approaches. For linear
regression tasks, we provide a precise characterization of the risk of the
interpolating estimator in an asymptotic regime where the size of the training
set grows in proportion to the features dimension. Our analysis allows us to
theoretically understand the effect of different factors, such as bag size on
the model prediction risk. In addition, we propose a mechanism for
differentially private learning from aggregate responses and derive the optimal
bag size in terms of prediction risk-privacy trade-off. We also carry out
thorough experiments to corroborate our theory and show the efficacy of the
interpolating estimator.
Related papers
- Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - Linked shrinkage to improve estimation of interaction effects in
regression models [0.0]
We develop an estimator that adapts well to two-way interaction terms in a regression model.
We evaluate the potential of the model for inference, which is notoriously hard for selection strategies.
Our models can be very competitive to a more advanced machine learner, like random forest, even for fairly large sample sizes.
arXiv Detail & Related papers (2023-09-25T10:03:39Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - When No-Rejection Learning is Consistent for Regression with Rejection [11.244583592648443]
We study a no-reject learning strategy that uses all the data to learn the prediction.
This paper investigates a no-reject learning strategy that uses all the data to learn the prediction.
arXiv Detail & Related papers (2023-07-06T11:43:22Z) - Learning from Aggregated Data: Curated Bags versus Random Bags [35.394402088653415]
We explore the possibility of training machine learning models with aggregated data labels, rather than individual labels.
For the curated bag setting, we show that we can perform gradient-based learning without any degradation in performance.
In the random bag setting, there is a trade-off between size of the bag and the achievable error rate as our bound indicates.
arXiv Detail & Related papers (2023-05-16T15:53:45Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Strategic Instrumental Variable Regression: Recovering Causal
Relationships From Strategic Responses [16.874125120501944]
We show that we can use strategic responses effectively to recover causal relationships between the observable features and outcomes we wish to predict.
Our work establishes a novel connection between strategic responses to machine learning models and instrumental variable (IV) regression.
arXiv Detail & Related papers (2021-07-12T22:12:56Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks.
Despite being successful in practice, several problems in understanding performance of adversarial training remain open.
We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.