On the accuracy and efficiency of group-wise clipping in differentially
private optimization
- URL: http://arxiv.org/abs/2310.19215v1
- Date: Mon, 30 Oct 2023 01:01:15 GMT
- Title: On the accuracy and efficiency of group-wise clipping in differentially
private optimization
- Authors: Zhiqi Bu, Ruixuan Liu, Yu-Xiang Wang, Sheng Zha, George Karypis
- Abstract summary: We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off.
We demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models.
- Score: 38.80873569002277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances have substantially improved the accuracy, memory cost, and
training speed of differentially private (DP) deep learning, especially on
large vision and language models with millions to billions of parameters. In
this work, we thoroughly study the per-sample gradient clipping style, a key
component in DP optimization. We show that different clipping styles have the
same time complexity but instantiate an accuracy-memory trade-off: while the
all-layer clipping (of coarse granularity) is the most prevalent and usually
gives the best accuracy, it incurs heavier memory cost compared to other
group-wise clipping, such as the layer-wise clipping (of finer granularity). We
formalize this trade-off through our convergence theory and complexity
analysis. Importantly, we demonstrate that the accuracy gap between group-wise
clipping and all-layer clipping becomes smaller for larger models, while the
memory advantage of the group-wise clipping remains. Consequently, the
group-wise clipping allows DP optimization of large models to achieve high
accuracy and low peak memory simultaneously.
Related papers
- Nearly Lossless Adaptive Bit Switching [8.485009775430411]
Experimental results on the ImageNet-1K classification demonstrate that our methods have enough advantages to state-of-the-art one-shot joint QAT in both multi-precision and mixed-precision.
arXiv Detail & Related papers (2025-02-03T09:46:26Z) - XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference [9.65524177141491]
Large Language Model (LLM) inference generates output tokens one-by-one, leading to many redundant computations.
KV-Cache framework makes a compromise between time and space complexities.
Existing studies reduce memory consumption by evicting some of cached data that have less important impact on inference accuracy.
We show that customizing the cache size for each layer in a personalized manner can yield a significant memory reduction.
arXiv Detail & Related papers (2024-12-08T11:32:08Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Exploring the Limits of Differentially Private Deep Learning with
Group-wise Clipping [91.60608388479645]
We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization.
This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
arXiv Detail & Related papers (2022-12-03T05:20:15Z) - Per-Clip Video Object Segmentation [110.08925274049409]
Recently, memory-based approaches show promising results on semisupervised video object segmentation.
We treat video object segmentation as clip-wise mask-wise propagation.
We propose a new method tailored for the per-clip inference.
arXiv Detail & Related papers (2022-08-03T09:02:29Z) - On the Convergence and Calibration of Deep Learning with Differential
Privacy [12.297499996547925]
Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence.
We show that noise addition only affects the privacy risk but not the convergence or calibration.
In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more textitd
arXiv Detail & Related papers (2021-06-15T01:32:29Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.