Orthogonalized SGD and Nested Architectures for Anytime Neural Networks
- URL: http://arxiv.org/abs/2008.06635v1
- Date: Sat, 15 Aug 2020 03:06:34 GMT
- Title: Orthogonalized SGD and Nested Architectures for Anytime Neural Networks
- Authors: Chengcheng Wan, Henry Hoffmann, Shan Lu, Michael Maire
- Abstract summary: Orthogonalized SGD dynamically re-balances task-specific gradients when training a multitask network.
Experiments demonstrate that training with Orthogonalized SGD significantly improves accuracy of anytime networks.
- Score: 30.598394152055338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel variant of SGD customized for training network
architectures that support anytime behavior: such networks produce a series of
increasingly accurate outputs over time. Efficient architectural designs for
these networks focus on re-using internal state; subnetworks must produce
representations relevant for both immediate prediction as well as refinement by
subsequent network stages. We consider traditional branched networks as well as
a new class of recursively nested networks. Our new optimizer, Orthogonalized
SGD, dynamically re-balances task-specific gradients when training a multitask
network. In the context of anytime architectures, this optimizer projects
gradients from later outputs onto a parameter subspace that does not interfere
with those from earlier outputs. Experiments demonstrate that training with
Orthogonalized SGD significantly improves generalization accuracy of anytime
networks.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Rotation Equivariant Proximal Operator for Deep Unfolding Methods in
Image Restoration [68.18203605110719]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework.
This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Network Embedding via Deep Prediction Model [25.727377978617465]
This paper proposes a network embedding framework to capture the transfer behaviors on structured networks via deep prediction models.
A network structure embedding layer is added into conventional deep prediction models, including Long Short-Term Memory Network and Recurrent Neural Network.
Experimental studies are conducted on various datasets including social networks, citation networks, biomedical network, collaboration network and language network.
arXiv Detail & Related papers (2021-04-27T16:56:00Z) - TSAM: Temporal Link Prediction in Directed Networks based on
Self-Attention Mechanism [2.5144068869465994]
We propose a deep learning model based on graph neural networks (GCN) and self-attention mechanism, namely TSAM.
We run comparative experiments on four realistic networks to validate the effectiveness of TSAM.
arXiv Detail & Related papers (2020-08-23T11:56:40Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - ForecastNet: A Time-Variant Deep Feed-Forward Neural Network
Architecture for Multi-Step-Ahead Time-Series Forecasting [6.043572971237165]
We propose ForecastNet, which uses a deep feed-forward architecture to provide a time-variant model.
ForecastNet is demonstrated to outperform statistical and deep learning benchmark models on several datasets.
arXiv Detail & Related papers (2020-02-11T01:03:33Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.