Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives
- URL: http://arxiv.org/abs/2003.10739v2
- Date: Fri, 20 Aug 2021 08:36:47 GMT
- Title: Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives
- Authors: Duo Li and Qifeng Chen
- Abstract summary: We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
- Score: 73.15276998621582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the depth of modern Convolutional Neural Networks (CNNs) surpasses that
of the pioneering networks with a significant margin, the traditional way of
appending supervision only over the final classifier and progressively
propagating gradient flow upstream remains the training mainstay. Seminal
Deeply-Supervised Networks (DSN) were proposed to alleviate the difficulty of
optimization arising from gradient flow through a long chain. However, it is
still vulnerable to issues including interference to the hierarchical
representation generation process and inconsistent optimization objectives, as
illustrated theoretically and empirically in this paper. Complementary to
previous training strategies, we propose Dynamic Hierarchical Mimicking, a
generic feature learning mechanism, to advance CNN training with enhanced
generalization ability. Partially inspired by DSN, we fork delicately designed
side branches from the intermediate layers of a given neural network. Each
branch can emerge from certain locations of the main branch dynamically, which
not only retains representation rooted in the backbone network but also
generates more diverse representations along its own pathway. We go one step
further to promote multi-level interactions among different branches through an
optimization formula with probabilistic prediction matching losses, thus
guaranteeing a more robust optimization process and better representation
ability. Experiments on both category and instance recognition tasks
demonstrate the substantial improvements of our proposed method over its
corresponding counterparts using diverse state-of-the-art CNN architectures.
Code and models are publicly available at https://github.com/d-li14/DHM
Related papers
- Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Optimisation & Generalisation in Networks of Neurons [8.078758339149822]
The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks.
A new theoretical framework is proposed for deriving architecture-dependent first-order optimisation algorithms.
A new correspondence is proposed between ensembles of networks and individual networks.
arXiv Detail & Related papers (2022-10-18T18:58:40Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Dynamic Game Theoretic Neural Optimizer [10.612273480358692]
We propose a novel dynamic game perspective by viewing each layer as a player in a dynamic game characterized by the DNN itself.
Our work marries strengths from both OCT and game theory, paving ways to new algorithmic opportunities from robust optimal control and bandit-based optimization.
arXiv Detail & Related papers (2021-05-08T21:56:14Z) - Faster Convergence in Deep-Predictive-Coding Networks to Learn Deeper
Representations [12.716429755564821]
Deep-predictive-coding networks (DPCNs) are hierarchical, generative models that rely on feed-forward and feed-back connections.
A crucial element of DPCNs is a forward-backward inference procedure to uncover sparse states of a dynamic model.
We propose an optimization strategy, with better empirical and theoretical convergence, based on accelerated proximal gradients.
arXiv Detail & Related papers (2021-01-18T02:30:13Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.