Residual Connections Harm Generative Representation Learning
- URL: http://arxiv.org/abs/2404.10947v3
- Date: Sat, 12 Oct 2024 01:49:15 GMT
- Title: Residual Connections Harm Generative Representation Learning
- Authors: Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire,
- Abstract summary: We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning.
Our modification improves linear probing accuracy for both, notably increasing ImageNet accuracy from 67.8% to 72.7% for MAEs with a VIT-B/16 backbone.
- Score: 22.21222349477351
- License:
- Abstract: We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning in generative representation learning frameworks, such as masked autoencoders (MAEs) and diffusion models. Our modification improves linear probing accuracy for both, notably increasing ImageNet accuracy from 67.8% to 72.7% for MAEs with a VIT-B/16 backbone, while also boosting generation quality for diffusion models. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance.
Related papers
- PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE is a self-supervised learning framework that enhances global feature representation of point cloud mask autoencoders.
We show that PseudoNeg-MAE achieves state-of-the-art performance on the ModelNet40 and ScanObjectNN datasets.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation [6.596361762662328]
We introduce an innovative encoder-decoder network structure enhanced with residual connections.
Our approach employs a multi-residual connection strategy designed to preserve the intricate details across various image scales more effectively.
To enhance the convergence rate of network training and mitigate sample imbalance issues, we have devised a modified cross-entropy loss function.
arXiv Detail & Related papers (2024-05-26T05:15:53Z) - Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors.
LFP assigns rewards to individual connections based on their respective contributions to solving a given task.
We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Unlocking the Potential of Federated Learning for Deeper Models [24.875271131226707]
Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients.
We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field.
These approaches can greatly improve the accuracy of FL on deeper models.
arXiv Detail & Related papers (2023-06-05T08:45:44Z) - Deep Augmentation: Self-Supervised Learning with Transformations in Activation Space [19.495587566796278]
We introduce Deep Augmentation, an approach to implicit data augmentation using dropout or PCA to transform a targeted layer within a neural network to improve performance and generalization.
We demonstrate Deep Augmentation through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning.
arXiv Detail & Related papers (2023-03-25T19:03:57Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Image Superresolution using Scale-Recurrent Dense Network [30.75380029218373]
Recent advances in the design of convolutional neural network (CNN) have yielded significant improvements in the performance of image super-resolution (SR)
We propose a scale recurrent SR architecture built upon units containing series of dense connections within a residual block (Residual Dense Blocks (RDBs))
Our scale recurrent design delivers competitive performance for higher scale factors while being parametrically more efficient as compared to current state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-28T09:18:43Z) - Implicit Under-Parameterization Inhibits Data-Efficient Deep
Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network.
We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z) - Adversarial Training Reduces Information and Improves Transferability [81.59364510580738]
Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility.
We show that the Adversarial Training can improve linear transferability to new tasks, from which arises a new trade-off between transferability of representations and accuracy on the source task.
arXiv Detail & Related papers (2020-07-22T08:30:16Z) - Untangling tradeoffs between recurrence and self-attention in neural
networks [81.30894993852813]
We present a formal analysis of how self-attention affects gradient propagation in recurrent networks.
We prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies.
We propose a relevancy screening mechanism that allows for a scalable use of sparse self-attention with recurrence.
arXiv Detail & Related papers (2020-06-16T19:24:25Z) - Non-Linearities Improve OrigiNet based on Active Imaging for Micro
Expression Recognition [8.112868317921853]
We introduce an active imaging concept to segregate active changes in expressive regions of a video into a single frame.
We propose a shallow CNN network: hybrid local receptive field based augmented learning network (OrigiNet) that efficiently learns significant features of the micro-expressions in a video.
arXiv Detail & Related papers (2020-05-16T13:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.