Related papers: Separation Power of Equivariant Neural Networks

Separation Power of Equivariant Neural Networks

URL: http://arxiv.org/abs/2406.08966v2
Date: Tue, 10 Dec 2024 13:03:40 GMT
Title: Separation Power of Equivariant Neural Networks
Authors: Marco Pacini, Xiaowen Dong, Bruno Lepri, Gabriele Santin,
Abstract summary: We analyze the separation power of equivariant neural networks, such as convolutional and permutation-invariant networks.<n>All non-polynomial activations, including ReLU and sigmoid, are equivalent in expressivity and reach maximum separation power.
Score: 11.906285279109477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The separation power of a machine learning model refers to its ability to distinguish between different inputs and is often used as a proxy for its expressivity. Indeed, knowing the separation power of a family of models is a necessary condition to obtain fine-grained universality results. In this paper, we analyze the separation power of equivariant neural networks, such as convolutional and permutation-invariant networks. We first present a complete characterization of inputs indistinguishable by models derived by a given architecture. From this results, we derive how separability is influenced by hyperparameters and architectural choices-such as activation functions, depth, hidden layer width, and representation types. Notably, all non-polynomial activations, including ReLU and sigmoid, are equivalent in expressivity and reach maximum separation power. Depth improves separation power up to a threshold, after which further increases have no effect. Adding invariant features to hidden representations does not impact separation power. Finally, block decomposition of hidden representations affects separability, with minimal components forming a hierarchy in separation power that provides a straightforward method for comparing the separation power of models.

Related papers

On Universality Classes of Equivariant Networks [9.137637807153464]
We investigate the approximation power of equivariant neural networks beyond separation constraints.<n>We show that separation power does not fully capture expressivity.<n>We identify settings where shallow equivariant networks do achieve universality.
arXiv Detail & Related papers (2025-06-02T22:07:52Z)
Learning local discrete features in explainable-by-design convolutional neural networks [0.0]
We introduce an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism. The model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections. By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels.
arXiv Detail & Related papers (2024-10-31T18:39:41Z)
Semantic Loss Functions for Neuro-Symbolic Structured Prediction [74.18322585177832]
We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training. It is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby. It can be combined with both discriminative and generative neural models.
arXiv Detail & Related papers (2024-05-12T22:18:25Z)
Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions [0.552480439325792]
storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn. We analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions.
arXiv Detail & Related papers (2024-04-20T15:12:47Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory [64.06519549649495]
We provide the definition of what we call functionally equivalent features. These features produce equivalent output under certain transformations. We propose an efficient algorithm named Iterative Feature Merging.
arXiv Detail & Related papers (2023-10-10T16:27:12Z)
Towards Rigorous Understanding of Neural Networks via Semantics-preserving Transformations [0.0]
We present an approach to the precise and global verification and explanation of Rectifier Neural Networks. Key to our approach is the symbolic execution of these networks that allows the construction of semantically equivalent Typed Affine Decision Structures.
arXiv Detail & Related papers (2023-01-19T11:35:07Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
Polysemanticity and Capacity in Neural Networks [2.9260206957981167]
Individual neurons in neural networks often represent a mixture of unrelated features. This phenomenon, called polysemanticity, can make interpreting neural networks more difficult.
arXiv Detail & Related papers (2022-10-04T20:28:43Z)
Universal approximation property of invertible neural networks [76.95927093274392]
Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning.
arXiv Detail & Related papers (2022-04-15T10:45:26Z)
Clustering units in neural networks: upstream vs downstream information [3.222802562733787]
We study modularity of hidden layer representations of feedforward, fully connected networks. We find two surprising results: first, dropout dramatically increased modularity, while other forms of weight regularization had more modest effects. This has important implications for representation-learning, as it suggests that finding modular representations that reflect structure in inputs may be a distinct goal from learning modular representations that reflect structure in outputs.
arXiv Detail & Related papers (2022-03-22T15:35:10Z)
Integral representations of shallow neural network with Rectified Power Unit activation function [5.863264019032882]
We derive a formula for the integral representation of a shallow neural network with the Rectified Power Unit activation function. The multidimensional result in this paper characterizes the set of functions that can be represented with bounded norm and possibly unbounded width.
arXiv Detail & Related papers (2021-12-20T15:18:11Z)
Exact solutions of interacting dissipative systems via weak symmetries [77.34726150561087]
We analytically diagonalize the Liouvillian of a class Markovian dissipative systems with arbitrary strong interactions or nonlinearity. This enables an exact description of the full dynamics and dissipative spectrum. Our method is applicable to a variety of other systems, and could provide a powerful new tool for the study of complex driven-dissipative quantum systems.
arXiv Detail & Related papers (2021-09-27T17:45:42Z)
Convolutional Filtering and Neural Networks with Non Commutative Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z)
Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay [4.042159113348107]
We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel. We show that similar results also hold for two-layer neural networks.
arXiv Detail & Related papers (2021-08-10T23:30:29Z)
It's FLAN time! Summing feature-wise latent representations for interpretability [0.0]
We propose a novel class of structurally-constrained neural networks, which we call FLANs (Feature-wise Latent Additive Networks) FLANs process each input feature separately, computing for each of them a representation in a common latent space. These feature-wise latent representations are then simply summed, and the aggregated representation is used for prediction.
arXiv Detail & Related papers (2021-06-18T12:19:33Z)
CausalX: Causal Explanations and Block Multilinear Factor Analysis [3.087360758008569]
We propose a unified multilinear model of wholes and parts. We introduce an incremental bottom-up computational alternative, the Incremental M-mode Block SVD. The resulting object representation is an interpretable choice of intrinsic causal factor representations related to an object's hierarchy of wholes and parts.
arXiv Detail & Related papers (2021-02-25T13:49:01Z)
Invariant Deep Compressible Covariance Pooling for Aerial Scene Categorization [80.55951673479237]
We propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization. We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:13:07Z)
Activation function dependence of the storage capacity of treelike neural networks [16.244541005112747]
nonlinear activation functions have been proposed for use in artificial neural networks. We study how activation functions affect the storage capacity of treelike two-layer networks.
arXiv Detail & Related papers (2020-07-21T23:51:45Z)
Learning to Manipulate Individual Objects in an Image [71.55005356240761]
We describe a method to train a generative model with latent factors that are independent and localized. This means that perturbing the latent variables affects only local regions of the synthesized image, corresponding to objects. Unlike other unsupervised generative models, ours enables object-centric manipulation, without requiring object-level annotations.
arXiv Detail & Related papers (2020-04-11T21:50:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.