Origami in N dimensions: How feed-forward networks manufacture linear
separability
- URL: http://arxiv.org/abs/2203.11355v1
- Date: Mon, 21 Mar 2022 21:33:55 GMT
- Title: Origami in N dimensions: How feed-forward networks manufacture linear
separability
- Authors: Christian Keup, Moritz Helias
- Abstract summary: We show that a feed-forward architecture has one primary tool at hand to achieve separability: progressive folding of the data manifold in unoccupied higher dimensions.
We argue that an alternative method based on shear, requiring very deep architectures, plays only a small role in real-world networks.
Based on the mechanistic insight, we predict that the progressive generation of separability is necessarily accompanied by neurons showing mixed selectivity and bimodal tuning curves.
- Score: 1.7404865362620803
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural networks can implement arbitrary functions. But, mechanistically, what
are the tools at their disposal to construct the target? For classification
tasks, the network must transform the data classes into a linearly separable
representation in the final hidden layer. We show that a feed-forward
architecture has one primary tool at hand to achieve this separability:
progressive folding of the data manifold in unoccupied higher dimensions. The
operation of folding provides a useful intuition in low-dimensions that
generalizes to high ones. We argue that an alternative method based on shear,
requiring very deep architectures, plays only a small role in real-world
networks. The folding operation, however, is powerful as long as layers are
wider than the data dimensionality, allowing efficient solutions by providing
access to arbitrary regions in the distribution, such as data points of one
class forming islands within the other classes. We argue that a link exists
between the universal approximation property in ReLU networks and the
fold-and-cut theorem (Demaine et al., 1998) dealing with physical paper
folding. Based on the mechanistic insight, we predict that the progressive
generation of separability is necessarily accompanied by neurons showing mixed
selectivity and bimodal tuning curves. This is validated in a network trained
on the poker hand task, showing the emergence of bimodal tuning curves during
training. We hope that our intuitive picture of the data transformation in deep
networks can help to provide interpretability, and discuss possible
applications to the theory of convolutional networks, loss landscapes, and
generalization.
TL;DR: Shows that the internal processing of deep networks can be thought of
as literal folding operations on the data distribution in the N-dimensional
activation space. A link to a well-known theorem in origami theory is provided.
Related papers
- ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Exploring explicit coarse-grained structure in artificial neural
networks [0.0]
We propose to employ the hierarchical coarse-grained structure in the artificial neural networks explicitly to improve the interpretability without degrading performance.
One is a neural network called TaylorNet, which aims to approximate the general mapping from input data to output result in terms of Taylor series directly.
The other is a new setup for data distillation, which can perform multi-level abstraction of the input dataset and generate new data.
arXiv Detail & Related papers (2022-11-03T13:06:37Z) - Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer.
We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit.
Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z) - A Theoretical View on Sparsely Activated Networks [21.156069843782017]
We present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures.
We then introduce a routing function based on locality sensitive hashing (LSH) that enables us to reason about how well sparse networks approximate target functions.
We prove that sparse networks can match the approximation power of dense networks on Lipschitz functions.
arXiv Detail & Related papers (2022-08-08T23:14:48Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - ReduNet: A White-box Deep Network from the Principle of Maximizing Rate
Reduction [32.489371527159236]
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.
We show that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets.
We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, that shares common characteristics of modern deep networks.
arXiv Detail & Related papers (2021-05-21T16:29:57Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling.
We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category.
We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.