What Information Does a ResNet Compress?
- URL: http://arxiv.org/abs/2003.06254v1
- Date: Fri, 13 Mar 2020 13:02:11 GMT
- Title: What Information Does a ResNet Compress?
- Authors: Luke Nicholas Darlow, Amos Storkey
- Abstract summary: We test whether the information bottleneck principle is applicable to a realistic setting using a ResNet model.
We find that two stages of learning happen for both training regimes, and that compression does occur, even for an autoencoder.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The information bottleneck principle (Shwartz-Ziv & Tishby, 2017) suggests
that SGD-based training of deep neural networks results in optimally compressed
hidden layers, from an information theoretic perspective. However, this claim
was established on toy data. The goal of the work we present here is to test
whether the information bottleneck principle is applicable to a realistic
setting using a larger and deeper convolutional architecture, a ResNet model.
We trained PixelCNN++ models as inverse representation decoders to measure the
mutual information between hidden layers of a ResNet and input image data, when
trained for (1) classification and (2) autoencoding. We find that two stages of
learning happen for both training regimes, and that compression does occur,
even for an autoencoder. Sampling images by conditioning on hidden layers'
activations offers an intuitive visualisation to understand what a ResNets
learns to forget.
Related papers
- Dynamic Encoding and Decoding of Information for Split Learning in
Mobile-Edge Computing: Leveraging Information Bottleneck Theory [1.1151919978983582]
Split learning is a privacy-preserving distributed learning paradigm in which an ML model is split into two parts (i.e., an encoder and a decoder)
In mobile-edge computing, network functions can be trained via split learning where an encoder resides in a user equipment (UE) and a decoder resides in the edge network.
We present a new framework and training mechanism to enable a dynamic balancing of the transmission resource consumption with the informativeness of the shared latent representations.
arXiv Detail & Related papers (2023-09-06T07:04:37Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Is Deep Image Prior in Need of a Good Education? [57.3399060347311]
Deep image prior was introduced as an effective prior for image reconstruction.
Despite its impressive reconstructive properties, the approach is slow when compared to learned or traditional reconstruction techniques.
We develop a two-stage learning paradigm to address the computational challenge.
arXiv Detail & Related papers (2021-11-23T15:08:26Z) - Recurrence along Depth: Deep Convolutional Neural Networks with
Recurrent Layer Aggregation [5.71305698739856]
This paper introduces a concept of layer aggregation to describe how information from previous layers can be reused to better extract features at the current layer.
We propose a very light-weighted module, called recurrent layer aggregation (RLA), by making use of the sequential structure of layers in a deep CNN.
Our RLA module is compatible with many mainstream deep CNNs, including ResNets, Xception and MobileNetV2.
arXiv Detail & Related papers (2021-10-22T15:36:33Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Automated Cleanup of the ImageNet Dataset by Model Consensus,
Explainability and Confident Learning [0.0]
ImageNet was the backbone of various convolutional neural networks (CNNs) trained on ILSVRC12Net.
This paper describes automated applications based on model consensus, explainability and confident learning to correct labeling mistakes.
The ImageNet-Clean improves the model performance by 2-2.4 % for SqueezeNet and EfficientNet-B0 models.
arXiv Detail & Related papers (2021-03-30T13:16:35Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Defending Adversarial Examples via DNN Bottleneck Reinforcement [20.08619981108837]
This paper presents a reinforcement scheme to alleviate the vulnerability of Deep Neural Networks (DNN) against adversarial attacks.
By reinforcing the former while maintaining the latter, any redundant information, be it adversarial or not, should be removed from the latent representation.
In order to reinforce the information bottleneck, we introduce the multi-scale low-pass objective and multi-scale high-frequency communication for better frequency steering in the network.
arXiv Detail & Related papers (2020-08-12T11:02:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.