Consistency Training of Multi-exit Architectures for Sensor Data
- URL: http://arxiv.org/abs/2109.13192v1
- Date: Mon, 27 Sep 2021 17:11:25 GMT
- Title: Consistency Training of Multi-exit Architectures for Sensor Data
- Authors: Aaqib Saeed
- Abstract summary: We present a novel and architecture-agnostic approach for robust training of multi-exit architectures termed consistent exit training.
We leverage weak supervision to align model output with consistency training and jointly optimize dual-losses in a multi-task learning fashion over the exits in a network.
- Score: 0.07614628596146598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have become larger over the years with increasing demand
of computational resources for inference; incurring exacerbate costs and
leaving little room for deployment on devices with limited battery and other
resources for real-time applications. The multi-exit architectures are type of
deep neural network that are interleaved with several output (or exit) layers
at varying depths of the model. They provide a sound approach for improving
computational time and energy utilization of running a model through producing
predictions from early exits. In this work, we present a novel and
architecture-agnostic approach for robust training of multi-exit architectures
termed consistent exit training. The crux of the method lies in a
consistency-based objective to enforce prediction invariance over clean and
perturbed inputs. We leverage weak supervision to align model output with
consistency training and jointly optimize dual-losses in a multi-task learning
fashion over the exits in a network. Our technique enables exit layers to
generalize better when confronted with increasing uncertainty, hence, resulting
in superior quality-efficiency trade-offs. We demonstrate through extensive
evaluation on challenging learning tasks involving sensor data that our
approach allows examples to exit earlier with better detection rate and without
executing all the layers in a deep model.
Related papers
- Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models.
They rely on sequential denoising steps during sample generation.
We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z) - DynaLay: An Introspective Approach to Dynamic Layer Selection for Deep
Networks [0.0]
We introduce textbfDynaLay, an alternative architecture that features a decision-making agent to adaptively select the most suitable layers for processing each input.
DynaLay reevaluates more complex inputs during inference, adjusting the computational effort to optimize both performance and efficiency.
Our experiments demonstrate that DynaLay achieves accuracy comparable to conventional deep models while significantly reducing computational demands.
arXiv Detail & Related papers (2023-12-20T05:55:05Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Rethinking Pareto Frontier for Performance Evaluation of Deep Neural
Networks [2.167843405313757]
We re-define the efficiency measure using a multi-objective optimization.
We combine competing variables with nature simultaneously in a single relative efficiency measure.
This allows to rank deep models that run efficiently on different computing hardware, and combines inference efficiency with training efficiency objectively.
arXiv Detail & Related papers (2022-02-18T15:58:17Z) - Improving the Accuracy of Early Exits in Multi-Exit Architectures via
Curriculum Learning [88.17413955380262]
Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy.
We introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning.
Our method consistently improves the accuracy of early exits compared to the standard training approach.
arXiv Detail & Related papers (2021-04-21T11:12:35Z) - Fast-Convergent Federated Learning [82.32029953209542]
Federated learning is a promising solution for distributing machine learning tasks through modern networks of mobile devices.
We propose a fast-convergent federated learning algorithm, called FOLB, which performs intelligent sampling of devices in each round of model training.
arXiv Detail & Related papers (2020-07-26T14:37:51Z) - Real-time Federated Evolutionary Neural Architecture Search [14.099753950531456]
Federated learning is a distributed machine learning approach to privacy preservation.
We propose an evolutionary approach to real-time federated neural architecture search that not only optimize the model performance but also reduces the local payload.
This way, we effectively reduce computational and communication costs required for evolutionary optimization and avoid big performance fluctuations of the local models.
arXiv Detail & Related papers (2020-03-04T17:03:28Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.