Related papers: On the Local Complexity of Linear Regions in Deep ReLU Networks

On the Local Complexity of Linear Regions in Deep ReLU Networks

URL: http://arxiv.org/abs/2412.18283v2
Date: Wed, 25 Dec 2024 02:14:07 GMT
Title: On the Local Complexity of Linear Regions in Deep ReLU Networks
Authors: Niket Patel, Guido Montúfar,
Abstract summary: We show theoretically that ReLU networks that learn low-dimensional feature representations have a lower local complexity.<n>In particular, we show that the local complexity serves as an upper bound on the total variation of the function over the input data distribution.
Score: 15.335716956682203
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We define the local complexity of a neural network with continuous piecewise linear activations as a measure of the density of linear regions over an input data distribution. We show theoretically that ReLU networks that learn low-dimensional feature representations have a lower local complexity. This allows us to connect recent empirical observations on feature learning at the level of the weight matrices with concrete properties of the learned functions. In particular, we show that the local complexity serves as an upper bound on the total variation of the function over the input data distribution and thus that feature learning can be related to adversarial robustness. Lastly, we consider how optimization drives ReLU networks towards solutions with lower local complexity. Overall, this work contributes a theoretical framework towards relating geometric properties of ReLU networks to different aspects of learning such as feature learning and representation cost.

Related papers

Regional, Lattice and Logical Representations of Neural Networks [0.5279873919047532]
We present an algorithm for the translation of feedforward neural networks with ReLU activation functions in hidden layers and truncated identity activation functions in the output layer.<n>We also empirically investigate the complexity of regional representations outputted by our method for neural networks with varying sizes.
arXiv Detail & Related papers (2025-06-06T07:58:09Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Simplifying complex machine learning by linearly separable network embedding spaces [45.62331048595689]
Low-dimensional embeddings are a cornerstone in the modelling and analysis of complex networks. We show that there are structural properties of network data that yields this linearity. We introduce novel graphlet-based methods enabling embedding of networks into more linearly separable spaces.
arXiv Detail & Related papers (2024-10-02T11:41:17Z)
Local and global topological complexity measures OF ReLU neural network functions [0.0]
We apply a piecewise-linear (PL) version of Morse theory due to Grunert-Kuhnel-Rote to define and study new local and global notions of topological complexity. We show how to construct, for each such F, a canonical polytopal complex K(F) and a deformation retract of the domain onto K(F), yielding a convenient compact model for performing calculations.
arXiv Detail & Related papers (2022-04-12T19:49:13Z)
The Role of Linear Layers in Nonlinear Interpolating Networks [13.25706838589123]
Our framework considers a family of networks of varying depth that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function. Our results show that adding linear layers to a ReLU network yields a representation cost that reflects a complex interplay between the alignment and sparsity of ReLU units.
arXiv Detail & Related papers (2022-02-02T02:33:24Z)
Traversing the Local Polytopes of ReLU Neural Networks: A Unified Approach for Network Verification [6.71092092685492]
neural networks (NNs) with ReLU activation functions have found success in a wide range of applications. Previous works to examine robustness and to improve interpretability partially exploited the piecewise linear function form of ReLU NNs. In this paper, we explore the unique topological structure that ReLU NNs create in the input space, identifying the adjacency among the partitioned local polytopes.
arXiv Detail & Related papers (2021-11-17T06:12:39Z)
An Entropy-guided Reinforced Partial Convolutional Network for Zero-Shot Learning [77.72330187258498]
We propose a novel Entropy-guided Reinforced Partial Convolutional Network (ERPCNet) ERPCNet extracts and aggregates localities based on semantic relevance and visual correlations without human-annotated regions. It not only discovers global-cooperative localities dynamically but also converges faster for policy gradient optimization.
arXiv Detail & Related papers (2021-11-03T11:13:13Z)
Clustering-Based Interpretation of Deep ReLU Network [17.234442722611803]
We recognize that the non-linear behavior of the ReLU function gives rise to a natural clustering. We propose a method to increase the level of interpretability of a fully connected feedforward ReLU neural network.
arXiv Detail & Related papers (2021-10-13T09:24:11Z)
Towards Understanding Theoretical Advantages of Complex-Reaction Networks [77.34726150561087]
We show that a class of functions can be approximated by a complex-reaction network using the number of parameters. For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks.
arXiv Detail & Related papers (2021-08-15T10:13:49Z)
Clustered Federated Learning via Generalized Total Variation Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure. Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV) Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.