Adaptive Estimators Show Information Compression in Deep Neural Networks
- URL: http://arxiv.org/abs/1902.09037v2
- Date: Thu, 30 Mar 2023 22:42:38 GMT
- Title: Adaptive Estimators Show Information Compression in Deep Neural Networks
- Authors: Ivan Chelombiev, Conor Houghton, Cian O'Donnell
- Abstract summary: The information bottleneck theory proposes that neural networks achieve good generalization by compressing their representations to disregard information that is not relevant to the task.
In this paper we develop more robust mutual information estimation techniques, that adapt to hidden activity of neural networks.
We show that saturation of the activation function is not required for compression, and the amount of compression varies between different activation functions.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve how neural networks function it is crucial to understand their
learning process. The information bottleneck theory of deep learning proposes
that neural networks achieve good generalization by compressing their
representations to disregard information that is not relevant to the task.
However, empirical evidence for this theory is conflicting, as compression was
only observed when networks used saturating activation functions. In contrast,
networks with non-saturating activation functions achieved comparable levels of
task performance but did not show compression. In this paper we developed more
robust mutual information estimation techniques, that adapt to hidden activity
of neural networks and produce more sensitive measurements of activations from
all functions, especially unbounded functions. Using these adaptive estimation
techniques, we explored compression in networks with a range of different
activation functions. With two improved methods of estimation, firstly, we show
that saturation of the activation function is not required for compression, and
the amount of compression varies between different activation functions. We
also find that there is a large amount of variation in compression between
different network initializations. Secondary, we see that L2 regularization
leads to significantly increased compression, while preventing overfitting.
Finally, we show that only compression of the last layer is positively
correlated with generalization.
Related papers
- Understanding the Effect of the Long Tail on Neural Network Compression [9.819486253052528]
We study the "long tail" phenomenon in computer vision datasets observed by Feldman, et al.
As compression limits the capacity of a network (and hence also its ability to memorize), we study the question: are mismatches between the full and compressed models correlated with the memorized training data?
arXiv Detail & Related papers (2023-06-09T20:18:05Z) - A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance.
We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression.
We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Supervised Compression for Resource-constrained Edge Computing Systems [26.676557573171618]
Full-scale deep neural networks are often too resource-intensive in terms of energy and storage.
This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently.
It achieves better supervised rate-distortion performance while also maintaining smaller end-to-end latency.
arXiv Detail & Related papers (2021-08-21T11:10:29Z) - DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep
Learning [79.89085533866071]
This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors.
DeepReduce decomposes tensors in two sets, values and indices, and allows both independent and combined compression of these sets.
Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods.
arXiv Detail & Related papers (2021-02-05T11:31:24Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Attribution Preservation in Network Compression for Reliable Network
Interpretation [81.84564694303397]
Neural networks embedded in safety-sensitive applications rely on input attribution for hindsight analysis and network compression to reduce its size for edge-computing.
We show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions.
This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions.
arXiv Detail & Related papers (2020-10-28T16:02:31Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z) - ReluDiff: Differential Verification of Deep Neural Networks [8.601847909798165]
We develop a new method for differential verification of two closely related networks.
We exploit structural and behavioral similarities of the two networks to more accurately bound the difference between the output neurons of the two networks.
Our experiments show that, compared to state-of-the-art verification tools, our method can achieve orders-of-magnitude speedup.
arXiv Detail & Related papers (2020-01-10T20:47:22Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.