Related papers: High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

URL: http://arxiv.org/abs/2308.05128v3
Date: Wed, 20 Nov 2024 09:13:36 GMT
Title: High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention
Authors: André Peter Kelm, Niels Hannemann, Bruno Heberle, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop,
Abstract summary: This paper introduces a novel network topology that seamlessly integrates dynamic inference cost with a top-down attention mechanism. Drawing inspiration from human perception, we combine sequential processing of generic low-level features with parallelism and nesting of high-level features. In terms of dynamic inference cost our methodology can achieve an exclusion of up to $73.48,%$ of parameters and $84.41,%$ fewer giga-multiply-accumulate (GMAC) operations.
Score: 4.051316555028782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel network topology that seamlessly integrates dynamic inference cost with a top-down attention mechanism, addressing two significant gaps in traditional deep learning models. Drawing inspiration from human perception, we combine sequential processing of generic low-level features with parallelism and nesting of high-level features. This design not only reflects a finding from recent neuroscience research regarding - spatially and contextually distinct neural activations - in human cortex, but also introduces a novel "cutout" technique: the ability to selectively activate %segments of the network for task-relevant only network segments of task-relevant categories to optimize inference cost and eliminate the need for re-training. We believe this paves the way for future network designs that are lightweight and adaptable, making them suitable for a wide range of applications, from compact edge devices to large-scale clouds. Our proposed topology also comes with a built-in top-down attention mechanism, which allows processing to be directly influenced by either enhancing or inhibiting category-specific high-level features, drawing parallels to the selective attention mechanism observed in human cognition. Using targeted external signals, we experimentally enhanced predictions across all tested models. In terms of dynamic inference cost our methodology can achieve an exclusion of up to $73.48\,\%$ of parameters and $84.41\,\%$ fewer giga-multiply-accumulate (GMAC) operations, analysis against comparative baselines show an average reduction of $40\,\%$ in parameters and $8\,\%$ in GMACs across the cases we evaluated.

Related papers

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling [35.837527844931266]
We introduce $textbfNeuroTrails$, a sparse multi-head architecture with dynamically evolving topology.<n>NeuroTrails displays efficacy with convolutional and transformer-based architectures on computer vision and language tasks.
arXiv Detail & Related papers (2025-05-23T13:53:21Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Neural Spatial-Temporal Tensor Representation for Infrared Small Target Detection [3.7038542578642724]
We introduce a Neural-represented spatial-temporal model (NeurSTT) for infrared small target detection. NeurSTT enhances spatial-temporal correlations in background approximation, thereby supporting target detection in an unsupervised manner. Visual and numerical results across various datasets demonstrate that our method outperforms the suboptimal method on $256 times 256$ sequences.
arXiv Detail & Related papers (2024-12-23T05:46:08Z)
Online High-Frequency Trading Stock Forecasting with Automated Feature Clustering and Radial Basis Function Neural Networks [1.7802147489386628]
This study presents an autonomous experimental machine learning protocol for high-frequency trading (HFT) stock price forecasting. By incorporating the k-means algorithm into the radial basis function neural network (RBFNN), the proposed method addresses the challenges of manual clustering.
arXiv Detail & Related papers (2024-11-23T18:30:04Z)
Informed deep hierarchical classification: a non-standard analysis inspired approach [0.0]
It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer. The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields. To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks.
arXiv Detail & Related papers (2024-09-25T14:12:50Z)
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters. We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z)
Select High-Level Features: Efficient Experts from a Hierarchical Classification Network [4.051316555028782]
This study introduces a novel expert generation method that dynamically reduces task and computational complexity without compromising predictive performance. It is based on a new hierarchical classification network topology that combines sequential processing of generic low-level features with parallelism and nesting of high-level features. In terms of dynamic inference our methodology can achieve an exclusion of up to 88.7,% of parameters and 73.4,% fewer giga-multiply accumulate (GMAC) operations.
arXiv Detail & Related papers (2024-03-08T00:02:42Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
On Excess Risk Convergence Rates of Neural Network Classifiers [8.329456268842227]
We study the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. We analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence.
arXiv Detail & Related papers (2023-09-26T17:14:10Z)
TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks. We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions. A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Communication-Computation Efficient Device-Edge Co-Inference via AutoML [4.06604174802643]
Device-edge co-inference partitions a deep neural network between a resource-constrained mobile device and an edge server. On-device model sparsity level and intermediate feature compression ratio have direct impacts on workload and communication overhead. We propose a novel automated machine learning (AutoML) framework based on deep reinforcement learning (DRL)
arXiv Detail & Related papers (2021-08-30T06:36:30Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Compact Neural Representation Using Attentive Network Pruning [1.0152838128195465]
We describe a Top-Down attention mechanism that is added to a Bottom-Up feedforward network to select important connections and subsequently prune redundant ones at all parametric layers. Our method not only introduces a novel hierarchical selection mechanism as the basis of pruning but also remains competitive with previous baseline methods in the experimental evaluation.
arXiv Detail & Related papers (2020-05-10T03:20:01Z)
Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.