State-driven Implicit Modeling for Sparsity and Robustness in Neural
Networks
- URL: http://arxiv.org/abs/2209.09389v1
- Date: Mon, 19 Sep 2022 23:58:48 GMT
- Title: State-driven Implicit Modeling for Sparsity and Robustness in Neural
Networks
- Authors: Alicia Y. Tsai, Juliette Decugis, Laurent El Ghaoui, Alper Atamt\"urk
- Abstract summary: We present a new approach to training implicit models, called State-driven Implicit Modeling (SIM)
SIM constrains the internal states and outputs to match that of a baseline model, circumventing costly backward computations.
We demonstrate how the SIM approach can be applied to significantly improve sparsity and robustness of baseline models trained on datasets.
- Score: 3.604879434384177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Implicit models are a general class of learning models that forgo the
hierarchical layer structure typical in neural networks and instead define the
internal states based on an ``equilibrium'' equation, offering competitive
performance and reduced memory consumption. However, training such models
usually relies on expensive implicit differentiation for backward propagation.
In this work, we present a new approach to training implicit models, called
State-driven Implicit Modeling (SIM), where we constrain the internal states
and outputs to match that of a baseline model, circumventing costly backward
computations. The training problem becomes convex by construction and can be
solved in a parallel fashion, thanks to its decomposable structure. We
demonstrate how the SIM approach can be applied to significantly improve
sparsity (parameter reduction) and robustness of baseline models trained on
FashionMNIST and CIFAR-100 datasets.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - State-space models can learn in-context by gradient descent [1.3087858009942543]
This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning.
We prove that a single structured state-space model layer, augmented with local self-attention, can reproduce the outputs of an implicit linear model.
The theoretical construction elucidates the role of local self-attention and multiplicative interactions in recurrent architectures as the key ingredients for enabling the expressive power typical of foundation models.
arXiv Detail & Related papers (2024-10-15T15:22:38Z) - A domain decomposition-based autoregressive deep learning model for unsteady and nonlinear partial differential equations [2.7755345520127936]
We propose a domain-decomposition-based deep learning (DL) framework, named CoMLSim, for accurately modeling unsteady and nonlinear partial differential equations (PDEs)
The framework consists of two key components: (a) a convolutional neural network (CNN)-based autoencoder architecture and (b) an autoregressive model composed of fully connected layers.
arXiv Detail & Related papers (2024-08-26T17:50:47Z) - Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models.
The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers.
We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z) - Regularized Sequential Latent Variable Models with Adversarial Neural
Networks [33.74611654607262]
We will present different ways of using high level latent random variables in RNN to model the variability in the sequential data.
We will explore possible ways of using adversarial method to train a variational RNN model.
arXiv Detail & Related papers (2021-08-10T08:05:14Z) - Stabilizing Equilibrium Models by Jacobian Regularization [151.78151873928027]
Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer.
We propose a regularization scheme for DEQ models that explicitly regularizes the Jacobian of the fixed-point update equations to stabilize the learning of equilibrium models.
We show that this regularization adds only minimal computational cost, significantly stabilizes the fixed-point convergence in both forward and backward passes, and scales well to high-dimensional, realistic domains.
arXiv Detail & Related papers (2021-06-28T00:14:11Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z) - Conditional Neural Architecture Search [5.466990830092397]
It is often the case a well-trained ML model does not fit to the constraint of deploying edge platforms.
We propose a conditional neural architecture search method using GAN, which produces feasible ML models for different platforms.
arXiv Detail & Related papers (2020-06-06T20:39:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.