Related papers: The Connection between Out-of-Distribution Generalization and Privacy of ML Models

The Connection between Out-of-Distribution Generalization and Privacy of ML Models

URL: http://arxiv.org/abs/2110.03369v1
Date: Thu, 7 Oct 2021 12:05:25 GMT
Title: The Connection between Out-of-Distribution Generalization and Privacy of ML Models
Authors: Divyat Mahajan, Shruti Tople, Amit Sharma
Abstract summary: We show that a lower OOD generalization gap does not imply better robustness to MI attacks. A model that captures stable features is more robust to MI attacks than models that exhibit better OOD generalization but do not learn stable features. For the same provable differential privacy guarantees, a model that learns stable features provides higher utility as compared to others.
Score: 11.580603875423408
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the goal of generalizing to out-of-distribution (OOD) data, recent domain generalization methods aim to learn "stable" feature representations whose effect on the output remains invariant across domains. Given the theoretical connection between generalization and privacy, we ask whether better OOD generalization leads to better privacy for machine learning models, where privacy is measured through robustness to membership inference (MI) attacks. In general, we find that the relationship does not hold. Through extensive evaluation on a synthetic dataset and image datasets like MNIST, Fashion-MNIST, and Chest X-rays, we show that a lower OOD generalization gap does not imply better robustness to MI attacks. Instead, privacy benefits are based on the extent to which a model captures the stable features. A model that captures stable features is more robust to MI attacks than models that exhibit better OOD generalization but do not learn stable features. Further, for the same provable differential privacy guarantees, a model that learns stable features provides higher utility as compared to others. Our results offer the first extensive empirical study connecting stable features and privacy, and also have a takeaway for the domain generalization community; MI attack can be used as a complementary metric to measure model quality.

Related papers

On Adversarial Robustness and Out-of-Distribution Robustness of Large Language Models [0.16874375111244325]
We investigate the correlation between adversarial robustness and OOD robustness in large language models (LLMs) Our findings highlight nuanced interactions between adversarial robustness and OOD robustness, with results indicating limited transferability. Further research is needed to evaluate these interactions across larger models and varied architectures.
arXiv Detail & Related papers (2024-12-13T20:04:25Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data. The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness [24.843692458375436]
This study investigates how models aligned with general-purpose preference data perform across five trustworthiness verticals. Our results demonstrate that RLHF on human preferences doesn't automatically guarantee trustworthiness, and reverse effects are often observed. We propose to adapt efficient influence function based data attribution methods to the RLHF setting to better understand the influence of fine-tuning data on individual trustworthiness benchmarks.
arXiv Detail & Related papers (2024-04-29T17:00:53Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Towards Robust Out-of-Distribution Generalization Bounds via Sharpness [41.65692353665847]
We study the effect of sharpness on how a model tolerates data change in domain shift. We propose a sharpness-based OOD generalization bound by taking robustness into consideration.
arXiv Detail & Related papers (2024-03-11T02:57:27Z)
Evaluating Membership Inference Attacks and Defenses in Federated Learning [23.080346952364884]
Membership Inference Attacks (MIAs) pose a growing threat to privacy preservation in federated learning. This paper conducts an evaluation of existing MIAs and corresponding defense strategies.
arXiv Detail & Related papers (2024-02-09T09:58:35Z)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models. We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench. GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
Rethinking Machine Learning Robustness via its Link with the Out-of-Distribution Problem [16.154434566725012]
We investigate the causes behind machine learning models' susceptibility to adversarial examples. We propose an OOD generalization method that stands against both adversary-induced and natural distribution shifts. Our approach consistently improves robustness to OOD adversarial inputs and outperforms state-of-the-art defenses.
arXiv Detail & Related papers (2022-02-18T00:17:23Z)
BEDS-Bench: Behavior of EHR-models under Distributional Shift--A Benchmark [21.040754460129854]
We release BEDS-Bench, a benchmark for quantifying the behavior of ML models over EHR data under OOD settings. We evaluate several learning algorithms under BEDS-Bench and find that all of them show poor generalization performance under distributional shift in general.
arXiv Detail & Related papers (2021-07-17T05:53:24Z)
Why Should I Trust a Model is Private? Using Shifts in Model Explanation for Evaluating Privacy-Preserving Emotion Recognition Model [35.016050900061]
We focus on using interpretable methods to evaluate a model's efficacy to preserve privacy with respect to sensitive variables. We show how certain commonly-used methods that seek to preserve privacy might not align with human perception of privacy preservation. We conduct crowdsourcing experiments to evaluate the inclination of the evaluators to choose a particular model for a given task.
arXiv Detail & Related papers (2021-04-18T09:56:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.