From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
- URL: http://arxiv.org/abs/2310.11439v3
- Date: Mon, 1 Jul 2024 14:39:54 GMT
- Title: From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
- Authors: Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Karol Arndt, Oliver Struckmeier, Markus Heinonen, Ville Kyrki, Samuel Kaski,
- Abstract summary: We introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for measuring the non-linearity of deep neural networks.
We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature.
- Score: 32.39176908225668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the last decade, we have witnessed the introduction of several novel deep neural network (DNN) architectures exhibiting ever-increasing performance across diverse tasks. Explaining the upward trend of their performance, however, remains difficult as different DNN architectures of comparable depth and width -- common factors associated with their expressive power -- may exhibit a drastically different performance even when trained on the same dataset. In this paper, we introduce the concept of the non-linearity signature of DNN, the first theoretically sound solution for approximately measuring the non-linearity of deep neural networks. Built upon a score derived from closed-form optimal transport mappings, this signature provides a better understanding of the inner workings of a wide range of DNN architectures and learning paradigms, with a particular emphasis on the computer vision task. We provide extensive experimental results that highlight the practical usefulness of the proposed non-linearity signature and its potential for long-reaching implications. The code for our work is available at https://github.com/qbouniot/AffScoreDeep
Related papers
- Unveiling the Unseen: Identifiable Clusters in Trained Depthwise
Convolutional Kernels [56.69755544814834]
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures.
This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers.
arXiv Detail & Related papers (2024-01-25T19:05:53Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - EIGNN: Efficient Infinite-Depth Graph Neural Networks [51.97361378423152]
Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications.
Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN)
We show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T08:16:58Z) - On the Application of Data-Driven Deep Neural Networks in Linear and
Nonlinear Structural Dynamics [28.979990729816638]
The use of deep neural network (DNN) models as surrogates for linear and nonlinear structural dynamical systems is explored.
The focus is on the development of efficient network architectures using fully-connected, sparsely-connected, and convolutional network layers.
It is shown that the proposed DNNs can be used as effective and accurate surrogates for predicting linear and nonlinear dynamical responses under harmonic loadings.
arXiv Detail & Related papers (2021-11-03T13:22:19Z) - A novel Deep Neural Network architecture for non-linear system
identification [78.69776924618505]
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification.
Inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function)
This architecture allows for automatic complexity selection based solely on available data.
arXiv Detail & Related papers (2021-06-06T10:06:07Z) - Encoding the latent posterior of Bayesian Neural Networks for
uncertainty quantification [10.727102755903616]
We aim for efficient deep BNNs amenable to complex computer vision architectures.
We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer.
Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient (in terms of computation and memory during both training and testing) ensembles.
arXiv Detail & Related papers (2020-12-04T19:50:09Z) - Operational vs Convolutional Neural Networks for Image Denoising [25.838282412957675]
Convolutional Neural Networks (CNNs) have recently become a favored technique for image denoising due to its adaptive learning ability.
We propose a heterogeneous network model which allows greater flexibility for embedding additional non-linearity at the core of the data transformation.
An extensive set of comparative evaluations of ONNs and CNNs over two severe image denoising problems yield conclusive evidence that ONNs enriched by non-linear operators can achieve a superior denoising performance against CNNs with both equivalent and well-known deep configurations.
arXiv Detail & Related papers (2020-09-01T12:15:28Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Constructing Deep Neural Networks with a Priori Knowledge of Wireless
Tasks [37.060397377445504]
Two kinds of permutation invariant properties widely existed in wireless tasks can be harnessed to reduce the number of model parameters.
We find special architecture of DNNs whose input-output relationships satisfy the properties, called permutation invariant DNN (PINN)
We take predictive resource allocation and interference coordination as examples to show how the PINNs can be employed for learning the optimal policy with unsupervised and supervised learning.
arXiv Detail & Related papers (2020-01-29T08:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.