Towards Disentangling Information Paths with Coded ResNeXt
- URL: http://arxiv.org/abs/2202.05343v2
- Date: Wed, 20 Sep 2023 13:57:27 GMT
- Title: Towards Disentangling Information Paths with Coded ResNeXt
- Authors: Apostolos Avranas and Marios Kountouris
- Abstract summary: We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
- Score: 11.884259630414515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The conventional, widely used treatment of deep learning models as black
boxes provides limited or no insights into the mechanisms that guide neural
network decisions. Significant research effort has been dedicated to building
interpretable models to address this issue. Most efforts either focus on the
high-level features associated with the last layers, or attempt to interpret
the output of a single layer. In this paper, we take a novel approach to
enhance the transparency of the function of the whole network. We propose a
neural network architecture for classification, in which the information that
is relevant to each class flows through specific paths. These paths are
designed in advance before training leveraging coding theory and without
depending on the semantic similarities between classes. A key property is that
each path can be used as an autonomous single-purpose model. This enables us to
obtain, without any additional training and for any class, a lightweight binary
classifier that has at least $60\%$ fewer parameters than the original network.
Furthermore, our coding theory based approach allows the neural network to make
early predictions at intermediate layers during inference, without requiring
its full evaluation. Remarkably, the proposed architecture provides all the
aforementioned properties while improving the overall accuracy. We demonstrate
these properties on a slightly modified ResNeXt model tested on CIFAR-10/100
and ImageNet-1k.
Related papers
- Informed deep hierarchical classification: a non-standard analysis inspired approach [0.0]
It consists in a multi-output deep neural network equipped with specific projection operators placed before each output layer.
The design of such an architecture, called lexicographic hybrid deep neural network (LH-DNN), has been possible by combining tools from different and quite distant research fields.
To assess the efficacy of the approach, the resulting network is compared against the B-CNN, a convolutional neural network tailored for hierarchical classification tasks.
arXiv Detail & Related papers (2024-09-25T14:12:50Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Exploring Learned Representations of Neural Networks with Principal
Component Analysis [1.0923877073891446]
In certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification.
We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse.
arXiv Detail & Related papers (2023-09-27T00:18:25Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Hidden Classification Layers: Enhancing linear separability between
classes in neural networks layers [0.0]
We investigate the impact on deep network performances of a training approach.
We propose a neural network architecture which induces an error function involving the outputs of all the network layers.
arXiv Detail & Related papers (2023-06-09T10:52:49Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Neural networks with linear threshold activations: structure and
algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class.
We also give precise bounds on the sizes of the neural networks required to represent any function in the class.
We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.