DynaLay: An Introspective Approach to Dynamic Layer Selection for Deep
Networks
- URL: http://arxiv.org/abs/2312.12781v1
- Date: Wed, 20 Dec 2023 05:55:05 GMT
- Title: DynaLay: An Introspective Approach to Dynamic Layer Selection for Deep
Networks
- Authors: Mrinal Mathur, Sergey Plis
- Abstract summary: We introduce textbfDynaLay, an alternative architecture that features a decision-making agent to adaptively select the most suitable layers for processing each input.
DynaLay reevaluates more complex inputs during inference, adjusting the computational effort to optimize both performance and efficiency.
Our experiments demonstrate that DynaLay achieves accuracy comparable to conventional deep models while significantly reducing computational demands.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep learning models have become increasingly computationally intensive,
requiring extensive computational resources and time for both training and
inference. A significant contributing factor to this challenge is the uniform
computational effort expended on each input example, regardless of its
complexity. We introduce \textbf{DynaLay}, an alternative architecture that
features a decision-making agent to adaptively select the most suitable layers
for processing each input, thereby endowing the model with a remarkable level
of introspection. DynaLay reevaluates more complex inputs during inference,
adjusting the computational effort to optimize both performance and efficiency.
The core of the system is a main model equipped with Fixed-Point Iterative
(FPI) layers, capable of accurately approximating complex functions, paired
with an agent that chooses these layers or a direct action based on the
introspection of the models inner state. The model invests more time in
processing harder examples, while minimal computation is required for easier
ones. This introspective approach is a step toward developing deep learning
models that "think" and "ponder", rather than "ballistically'' produce answers.
Our experiments demonstrate that DynaLay achieves accuracy comparable to
conventional deep models while significantly reducing computational demands.
Related papers
- Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba.
It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies.
This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z) - Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models [16.16372459671255]
Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget.
We propose a novel framework that integrates smaller auxiliary modules within each Feed-Forward Network layer of the LLM.
We show that trained routers operate differently from oracles and often yield suboptimal solutions.
arXiv Detail & Related papers (2024-10-01T16:10:21Z) - Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder.
Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z) - Characterizing and overcoming the greedy nature of learning in
multi-modal deep neural networks [62.48782506095565]
We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities.
We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
arXiv Detail & Related papers (2022-02-10T20:11:21Z) - Consistency Training of Multi-exit Architectures for Sensor Data [0.07614628596146598]
We present a novel and architecture-agnostic approach for robust training of multi-exit architectures termed consistent exit training.
We leverage weak supervision to align model output with consistency training and jointly optimize dual-losses in a multi-task learning fashion over the exits in a network.
arXiv Detail & Related papers (2021-09-27T17:11:25Z) - Deep Reinforcement Learning for Combinatorial Optimization: Covering
Salesman Problems [4.692304496312442]
This paper introduces a new deep learning approach to approximately solve the Covering Salesman Problem (CSP)
In this approach, given the city locations of a CSP as input, a deep neural network model is designed to directly output the solution.
It is trained using the deep reinforcement learning without supervision.
arXiv Detail & Related papers (2021-02-11T07:25:04Z) - Self Normalizing Flows [65.73510214694987]
We propose a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer.
This reduces the computational complexity of each layer's exact update from $mathcalO(D3)$ to $mathcalO(D2)$.
We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts.
arXiv Detail & Related papers (2020-11-14T09:51:51Z) - Computation on Sparse Neural Networks: an Inspiration for Future
Hardware [20.131626638342706]
We describe the current status of the research on the computation of sparse neural networks.
We discuss the model accuracy influenced by the number of weight parameters and the structure of the model.
We show that for practically complicated problems, it is more beneficial to search large and sparse models in the weight dominated region.
arXiv Detail & Related papers (2020-04-24T19:13:50Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.