Related papers: Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

URL: http://arxiv.org/abs/2008.06635v1
Date: Sat, 15 Aug 2020 03:06:34 GMT
Title: Orthogonalized SGD and Nested Architectures for Anytime Neural Networks
Authors: Chengcheng Wan, Henry Hoffmann, Shan Lu, Michael Maire
Abstract summary: Orthogonalized SGD dynamically re-balances task-specific gradients when training a multitask network. Experiments demonstrate that training with Orthogonalized SGD significantly improves accuracy of anytime networks.
Score: 30.598394152055338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks.

Related papers

ResNets Are Deeper Than You Think [5.250288418639077]
Residual connections remain ubiquitous in modern neural network architectures nearly a decade after their introduction.<n>Their widespread adoption is often credited to their dramatically improved trainability.<n>We show that residual connections confer performance advantages beyond optimization, pointing instead to a deeper inductive bias aligned with the structure of natural data.
arXiv Detail & Related papers (2025-06-17T10:33:22Z)
Generalization emerges from local optimization in a self-organized learning network [0.0]
We design and analyze a new paradigm for building supervised learning networks, driven only by local optimization rules without relying on a global error function. Our network stores new knowledge in the nodes accurately and instantaneously, in the form of a lookup table. We show on numerous examples of classification tasks that the networks generated by our algorithm systematically reach such a state of perfect generalization when the number of learned examples becomes sufficiently large. We report on the dynamics of the change of state and show that it is abrupt and has the distinctive characteristics of a first order phase transition, a phenomenon already observed for traditional learning networks and known as grokking.
arXiv Detail & Related papers (2024-10-03T15:32:08Z)
Efficient Training of Deep Neural Operator Networks via Randomized Sampling [0.0]
Deep operator network (DeepNet) has demonstrated success in the real-time prediction of complex dynamics across various scientific and engineering applications. We introduce a random sampling technique to be adopted the training of DeepONet, aimed at improving generalization ability of the model, while significantly reducing computational time. Our results indicate that incorporating randomization in the trunk network inputs during training enhances the efficiency and robustness of DeepONet, offering a promising avenue for improving the framework's performance in modeling complex physical systems.
arXiv Detail & Related papers (2024-09-20T07:18:31Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration [62.41329042683779]
We propose a high-accuracy rotation equivariant proximal network that embeds rotation symmetry priors into the deep unfolding framework. This study makes efforts to suggest a high-accuracy rotation equivariant proximal network that effectively embeds rotation symmetry priors into the deep unfolding framework.
arXiv Detail & Related papers (2023-12-25T11:53:06Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z)
A Differential Game Theoretic Neural Optimizer for Training Residual Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers. The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
ForecastNet: A Time-Variant Deep Feed-Forward Neural Network Architecture for Multi-Step-Ahead Time-Series Forecasting [6.043572971237165]
We propose ForecastNet, which uses a deep feed-forward architecture to provide a time-variant model. ForecastNet is demonstrated to outperform statistical and deep learning benchmark models on several datasets.
arXiv Detail & Related papers (2020-02-11T01:03:33Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.