Out-of-Distribution Robustness in Deep Learning Compression
- URL: http://arxiv.org/abs/2110.07007v1
- Date: Wed, 13 Oct 2021 19:54:07 GMT
- Title: Out-of-Distribution Robustness in Deep Learning Compression
- Authors: Eric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
- Abstract summary: Deep neural network (DNN) compression systems have proved to be highly effective for designing source codes for many natural sources.
These systems suffer from vulnerabilities to distribution shifts as well as out-of-distribution (OOD) data, which reduces their real-world applications.
We propose algorithmic and architectural frameworks built on two principled methods: one that trains DNN compressors using distributionally-robust optimization (DRO) and the other which uses a structured latent code.
- Score: 28.049124970993056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, deep neural network (DNN) compression systems have proved to
be highly effective for designing source codes for many natural sources.
However, like many other machine learning systems, these compressors suffer
from vulnerabilities to distribution shifts as well as out-of-distribution
(OOD) data, which reduces their real-world applications. In this paper, we
initiate the study of OOD robust compression. Considering robustness to two
types of ambiguity sets (Wasserstein balls and group shifts), we propose
algorithmic and architectural frameworks built on two principled methods: one
that trains DNN compressors using distributionally-robust optimization (DRO),
and the other which uses a structured latent code. Our results demonstrate that
both methods enforce robustness compared to a standard DNN compressor, and that
using a structured code can be superior to the DRO compressor. We observe
tradeoffs between robustness and distortion and corroborate these findings
theoretically for a specific class of sources.
Related papers
- Optimal Neural Compressors for the Rate-Distortion-Perception Tradeoff [29.69773024077467]
Recent efforts in neural compression have focused on the rate-distortion-perception tradeoff.
In this paper, we propose neural compressors that are low complexity and benefit from high packing efficiency.
arXiv Detail & Related papers (2025-03-21T22:18:52Z) - "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression
with Super-resolution Neural Networks [13.706955134941385]
We propose SRN-SZ, a deep learning-based scientific error-bounded lossy compressor.
SRN-SZ applies the most advanced super-resolution network HAT for its compression.
In experiments, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound.
arXiv Detail & Related papers (2023-09-07T22:15:32Z) - Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler
Alignment of Embeddings for Asymmetrical dual encoders [89.29256833403169]
We introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods.
KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation.
Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having 3x faster inference.
arXiv Detail & Related papers (2023-03-31T15:44:13Z) - Do Neural Networks Compress Manifolds Optimally? [22.90338354582811]
Artificial Neural-Network-based (ANN-based) lossy compressors have recently obtained striking results on several sources.
We show that state-of-the-art ANN-based compressors fail to optimally compress the sources, especially at high rates.
arXiv Detail & Related papers (2022-05-17T17:41:53Z) - EF-BV: A Unified Theory of Error Feedback and Variance Reduction
Mechanisms for Biased and Unbiased Compression in Distributed Optimization [7.691755449724637]
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck.
There are two classes of compression operators and separate algorithms making use of them.
We propose a new algorithm, recovering DIANA and EF21 as particular cases.
arXiv Detail & Related papers (2022-05-09T10:44:23Z) - Compact CNN Structure Learning by Knowledge Distillation [34.36242082055978]
We propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure.
Our method results in a state of the art network compression while being capable of achieving better inference accuracy.
In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression.
arXiv Detail & Related papers (2021-04-19T10:34:22Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Unfolding Neural Networks for Compressive Multichannel Blind
Deconvolution [71.29848468762789]
We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution.
In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter.
We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.
arXiv Detail & Related papers (2020-10-22T02:34:33Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - A Better Alternative to Error Feedback for Communication-Efficient
Distributed Learning [0.0]
We show that our approach leads to vast improvements over EF, including reduced memory requirements, better complexity guarantees and fewer assumptions.
We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits.
arXiv Detail & Related papers (2020-06-19T11:24:41Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.