Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data
- URL: http://arxiv.org/abs/2007.03724v1
- Date: Tue, 7 Jul 2020 18:25:25 GMT
- Title: Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data
- Authors: Alireza Sadeghi, Gang Wang, Meng Ma, Georgios B. Giannakis
- Abstract summary: The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
- Score: 66.78671826743884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data used to train machine learning models can be adversarial--maliciously
constructed by adversaries to fool the model. Challenge also arises by privacy,
confidentiality, or due to legal constraints when data are geographically
gathered and stored across multiple learners, some of which may hold even an
"anonymized" or unreliable dataset. In this context, the distributionally
robust optimization framework is considered for training a parametric model,
both in centralized and federated learning settings. The objective is to endow
the trained model with robustness against adversarially manipulated input data,
or, distributional uncertainties, such as mismatches between training and
testing data distributions, or among datasets stored at different workers. To
this aim, the data distribution is assumed unknown, and lies within a
Wasserstein ball centered around the empirical data distribution. This robust
learning task entails an infinite-dimensional optimization problem, which is
challenging. Leveraging a strong duality result, a surrogate is obtained, for
which three stochastic primal-dual algorithms are developed: i) stochastic
proximal gradient descent with an $\epsilon$-accurate oracle, which invokes an
oracle to solve the convex sub-problems; ii) stochastic proximal gradient
descent-ascent, which approximates the solution of the convex sub-problems via
a single gradient ascent step; and, iii) a distributionally robust federated
learning algorithm, which solves the sub-problems locally at different workers
where data are stored. Compared to the empirical risk minimization and
federated learning methods, the proposed algorithms offer robustness with
little computation overhead. Numerical tests using image datasets showcase the
merits of the proposed algorithms under several existing adversarial attacks
and distributional uncertainties.
Related papers
- Robust training of implicit generative models for multivariate and heavy-tailed distributions with an invariant statistical loss [0.4249842620609682]
We build on the textitinvariant statistical loss (ISL) method introduced in citede2024training.
We extend it to handle heavy-tailed and multivariate data distributions.
We assess its performance in generative generative modeling and explore its potential as a pretraining technique for generative adversarial networks (GANs)
arXiv Detail & Related papers (2024-10-29T10:27:50Z) - FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms
for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately.
We study the currently popular data partitioning techniques and visualize their main disadvantages.
We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Two-Stage Robust and Sparse Distributed Statistical Inference for
Large-Scale Data [18.34490939288318]
We address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers.
We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity.
arXiv Detail & Related papers (2022-08-17T11:17:47Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Distributionally Robust Learning with Stable Adversarial Training [34.74504615726101]
Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts.
We propose a novel Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set.
arXiv Detail & Related papers (2021-06-30T03:05:45Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Stable Adversarial Learning under Distributional Shifts [46.98655899839784]
Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts.
We propose Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set.
arXiv Detail & Related papers (2020-06-08T08:42:34Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.