Related papers: PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution

PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution

URL: http://arxiv.org/abs/2507.01695v1
Date: Wed, 02 Jul 2025 13:22:05 GMT
Title: PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution
Authors: Omkar Shende, Gayathri Ananthanarayanan, Marcello Traiola,
Abstract summary: PERTINENCE is a novel online method designed to analyze the complexity of input features.<n>It dynamically selects the most suitable model from a pre-trained set to process a given input.<n>It achieves better or comparable accuracy with up to 36% fewer operations.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep neural networks (DNNs) have become ubiquitous thanks to their remarkable ability to model complex patterns across various domains such as computer vision, speech recognition, robotics, etc. While large DNN models are often more accurate than simpler, lightweight models, they are also resource- and energy-hungry. Hence, it is imperative to design methods to reduce reliance on such large models without significant degradation in output accuracy. The high computational cost of these models is often necessary only for a reduced set of challenging inputs, while lighter models can handle most simple ones. Thus, carefully combining properties of existing DNN models in a dynamic, input-based way opens opportunities to improve efficiency without impacting accuracy. In this work, we introduce PERTINENCE, a novel online method designed to analyze the complexity of input features and dynamically select the most suitable model from a pre-trained set to process a given input effectively. To achieve this, we employ a genetic algorithm to explore the training space of an ML-based input dispatcher, enabling convergence towards the Pareto front in the solution space that balances overall accuracy and computational efficiency. We showcase our approach on state-of-the-art Convolutional Neural Networks (CNNs) trained on the CIFAR-10 and CIFAR-100, as well as Vision Transformers (ViTs) trained on TinyImageNet dataset. We report results showing PERTINENCE's ability to provide alternative solutions to existing state-of-the-art models in terms of trade-offs between accuracy and number of operations. By opportunistically selecting among models trained for the same task, PERTINENCE achieves better or comparable accuracy with up to 36% fewer operations.

Related papers

ALWNN Empowered Automatic Modulation Classification: Conquering Complexity and Scarce Sample Conditions [24.59462798452397]
This paper proposes an automatic modulation classification model based on the Adaptive Lightweight Wavelet Neural Network (ALWNN) and the few-shot framework (MALWNN)<n>The ALWNN model, by integrating the adaptive wavelet neural network and depth separable convolution, reduces the number of model parameters and computational complexity.<n> Experiments with MALWNN show its superior performance in few-shot learning scenarios compared to other algorithms.
arXiv Detail & Related papers (2025-03-24T06:14:33Z)
A Multi-Fidelity Graph U-Net Model for Accelerated Physics Simulations [1.2430809884830318]
We propose a novel GNN architecture, Multi-Fidelity U-Net, that utilizes the advantages of the multi-fidelity methods for enhancing the performance of the GNN model.<n>We show that the proposed approach performs significantly better in accuracy and data requirement.<n>We also present Multi-Fidelity U-Net Lite, a faster version of the proposed architecture, with 35% faster training, with 2 to 5% reduction in accuracy.
arXiv Detail & Related papers (2024-12-19T20:09:38Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators [7.152059921639833]
We show that many large-scale deep neural networks can be successfully retrained to show iso-accuracy on AIMC. Our results suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy.
arXiv Detail & Related papers (2023-02-16T18:25:06Z)
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z)
HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal. We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z)
Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks [20.374784902476318]
Pruning, as a method to introduce zeros to model weights, has shown to be an effective method to provide good trade-offs between model accuracy and computation efficiency. Some modern processors are equipped with fast on-chip scratchpad memories and gather/scatter engines that perform indirect load and store operations on such memories. In this work, we propose a set of novel sparse patterns, named gather-scatter (GS) patterns, to utilize the scratchpad memories and gather/scatter engines to speed up neural network inferences.
arXiv Detail & Related papers (2021-12-20T22:55:45Z)
Learning to Learn Parameterized Classification Networks for Scalable Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change. We employ meta learners to generate convolutional weights of main networks for various input scales. We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly. Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.