Related papers: Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN

Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN

URL: http://arxiv.org/abs/2310.09163v2
Date: Fri, 10 May 2024 08:43:52 GMT
Title: Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN
Authors: Florence Regol, Joud Chataoui, Mark Coates,
Abstract summary: Early-exiting dynamic neural networks (EDNN) allow a model to make some of its predictions from intermediate layers (i.e., early-exit) Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations. We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities.
Score: 20.380620709345898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large pretrained models, coupled with fine-tuning, are slowly becoming established as the dominant architecture in machine learning. Even though these models offer impressive performance, their practical application is often limited by the prohibitive amount of resources required for every inference. Early-exiting dynamic neural networks (EDNN) circumvent this issue by allowing a model to make some of its predictions from intermediate layers (i.e., early-exit). Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations. As a result, most existing approaches rely on thresholding confidence metrics for the gating mechanism and strive to improve the underlying backbone network and the inference modules. Although successful, this approach has two fundamental shortcomings: 1) the GMs and the IMs are decoupled during training, leading to a train-test mismatch; and 2) the thresholding gating mechanism introduces a positive bias into the predictive probabilities, making it difficult to readily extract uncertainty information. We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities.

Related papers

A Scalable Pretraining Framework for Link Prediction with Efficient Adaptation [16.82426251068573]
Link Prediction (LP) is a critical task in graph machine learning.<n>Existing methods face key challenges including limited supervision from sparse connectivity.<n>We explore pretraining as a solution to address these challenges.
arXiv Detail & Related papers (2025-08-06T17:10:31Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
Learning multi-phase flow and transport in fractured porous media with auto-regressive and recurrent graph neural networks [0.3749861135832073]
We propose to learn the complex multi-phase flow and transport dynamics in fractured porous media with graph neural networks (GNNs) GNNs are well suited for this task due to the unstructured topology of the grid resulting from the Embedded Discrete Fracture Model (EDFM) discretization. We show that both GNNs generalize well to unseen fracture realizations, with comparable performance in forecasting saturation sequences, and slightly better performance for the recurrent GNN in predicting pressure sequences.
arXiv Detail & Related papers (2025-02-22T10:12:52Z)
A Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction [9.546043411729206]
Interval type 2 fuzzy neural network (IT2FNN) has shown exceptional performance in uncertainty modelling for single-step prediction tasks. This paper proposes a new selforganizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) Experimental results on chaotic and microgrid prediction problems demonstrate that SOIT2FNN-MO outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-07-10T19:35:44Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks. By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections. Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
Neural Relational Inference with Fast Modular Meta-learning [25.313516707169498]
Graph neural networks (GNNs) are effective models for many dynamical systems consisting of entities and relations. Relational inference is the problem of inferring these interactions and learning the dynamics from observational data. We frame relational inference as a textitmodular meta-learning problem, where neural modules are trained to be composed in different ways to solve many tasks.
arXiv Detail & Related papers (2023-10-10T21:05:13Z)
Amortised Inference in Bayesian Neural Networks [0.0]
We introduce the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN) We show that the amortised inference is of similar or better quality to those obtained through traditional variational inference. We then discuss how the APOVI-BNN may be viewed as a new member of the neural process family.
arXiv Detail & Related papers (2023-09-06T14:02:33Z)
Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency. We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training. We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z)
Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment [32.01355605506855]
Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training. Disagreements often have closer top-1 and top-2 output probabilities, and $Margin$ is a better indicator than the other uncertainty metrics to distinguish disagreements. We opensource our code and models as a new benchmark for further studying the quantized models.
arXiv Detail & Related papers (2022-04-08T11:19:16Z)
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues. We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z)
Obtaining Faithful Interpretations from Compositional Neural Networks [72.41100663462191]
We evaluate the intermediate outputs of NMNs on NLVR2 and DROP datasets. We find that the intermediate outputs differ from the expected output, illustrating that the network structure does not provide a faithful explanation of model behaviour.
arXiv Detail & Related papers (2020-05-02T06:50:35Z)
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model. This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs) The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.