Coupling public and private gradient provably helps optimization
- URL: http://arxiv.org/abs/2310.01304v1
- Date: Mon, 2 Oct 2023 16:08:18 GMT
- Title: Coupling public and private gradient provably helps optimization
- Authors: Ruixuan Liu, Zhiqi Bu, Yu-xiang Wang, Sheng Zha, George Karypis
- Abstract summary: The success of large neural networks is crucially determined by the availability of data.
It has been observed that training only on a small amount of public data can lead to degradation of accuracy.
- Score: 38.80873569002277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of large neural networks is crucially determined by the
availability of data. It has been observed that training only on a small amount
of public data, or privately on the abundant private data can lead to
undesirable degradation of accuracy. In this work, we leverage both private and
public data to improve the optimization, by coupling their gradients via a
weighted linear combination. We formulate an optimal solution for the optimal
weight in the convex setting to indicate that the weighting coefficient should
be hyperparameter-dependent. Then, we prove the acceleration in the convergence
of non-convex loss and the effects of hyper-parameters such as privacy budget,
number of iterations, batch size, and model size on the choice of the weighting
coefficient. We support our analysis with empirical experiments across language
and vision benchmarks, and provide a guideline for choosing the optimal weight
of the gradient coupling.
Related papers
- On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance [1.1218431616419589]
We show that, in a stylized model, optimizing with Gradient Descent with differential privacy (DP-GD) suffers when learning low-frequency classes.<n>In particular, DP-AdamBC that removes the DP bias from estimating loss curvature is a crucial component to avoid the ill-condition caused by heavy-tail class imbalance.
arXiv Detail & Related papers (2025-07-14T17:57:08Z) - Dataset Distillation as Pushforward Optimal Quantization [1.039189397779466]
We propose a simple extension of the state-of-the-art data distillation method D4M, achieving better performance on the ImageNet-1K dataset with trivial additional computation.
We demonstrate that when equipped with an encoder-decoder structure, the empirically successful disentangled methods can be reformulated as an optimal quantization problem.
In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems, demonstrating consistency of distilled datasets for diffusion-based generative priors.
arXiv Detail & Related papers (2025-01-13T20:41:52Z) - AutoScale: Automatic Prediction of Compute-optimal Data Composition for Training LLMs [61.13296177652599]
This paper demonstrates that the optimal composition of training data from different domains is scale-dependent.
We introduce *AutoScale*, a novel, practical approach for optimizing data compositions at potentially large training data scales.
Our evaluation on GPT-2 Large and BERT pre-training demonstrates *AutoScale*'s effectiveness in improving training convergence and downstream performance.
arXiv Detail & Related papers (2024-07-29T17:06:30Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Quantization Avoids Saddle Points in Distributed Optimization [1.579622195923387]
Distributed non optimization underpins key functionalities of numerous distributed systems.
The aim of this paper is to prove that it can effectively escape saddle points convergence to a second-order stationary point convergence.
With an easily adjustable quantization, the approach allows a user control to aggressively reduce communication overhead.
arXiv Detail & Related papers (2024-03-15T15:58:20Z) - Online Sensitivity Optimization in Differentially Private Learning [8.12606646175019]
We present a novel approach to dynamically optimize the clipping threshold.
We treat this threshold as an additional learnable parameter, establishing a clean relationship between the threshold and the cost function.
Our method is thoroughly assessed against alternative fixed and adaptive strategies across diverse datasets, tasks, model dimensions, and privacy levels.
arXiv Detail & Related papers (2023-10-02T00:30:49Z) - Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.
Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure.
We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z) - Efficient Private SCO for Heavy-Tailed Data via Averaged Clipping [40.69950711262191]
We consider differentially private convex optimization for heavy-tailed data with the guarantee of being differentially private (DP)
We establish new convergence results and improved complexity bounds for the proposed algorithm called AClipped-dpSGD for constrained and unconstrained convex problems.
arXiv Detail & Related papers (2022-06-27T01:39:15Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - Debiasing In-Sample Policy Performance for Small-Data, Large-Scale
Optimization [4.554894288663752]
We propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.
Unlike cross-validation, our approach avoids sacrificing data for a test set.
We prove our estimator performs well in the small-data, largescale regime.
arXiv Detail & Related papers (2021-07-26T19:00:51Z) - Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.