The Space Between: On Folding, Symmetries and Sampling
        - URL: http://arxiv.org/abs/2503.08502v1
 - Date: Tue, 11 Mar 2025 14:54:25 GMT
 - Title: The Space Between: On Folding, Symmetries and Sampling
 - Authors: Michal Lewandowski, Bernhard Heinzl, Raphael Pisoni, Bernhard A. Moser, 
 - Abstract summary: We propose a space folding measure based on Hamming distance in the ReLU activation space.<n>We show that space folding values increase with network depth when the generalization error is low, but decrease when the error increases.<n>Inspired by these findings, we outline a novel regularization scheme that encourages the network to seek solutions characterized by higher folding values.
 - Score: 4.16445550760248
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Recent findings suggest that consecutive layers of neural networks with the ReLU activation function \emph{fold} the input space during the learning process. While many works hint at this phenomenon, an approach to quantify the folding was only recently proposed by means of a space folding measure based on Hamming distance in the ReLU activation space. We generalize this measure to a wider class of activation functions through introduction of equivalence classes of input data, analyse its mathematical and computational properties and come up with an efficient sampling strategy for its implementation. Moreover, it has been observed that space folding values increase with network depth when the generalization error is low, but decrease when the error increases. This underpins that learned symmetries in the data manifold (e.g., invariance under reflection) become visible in terms of space folds, contributing to the network's generalization capacity. Inspired by these findings, we outline a novel regularization scheme that encourages the network to seek solutions characterized by higher folding values. 
 
       
      
        Related papers
        - On Space Folds of ReLU Neural Networks [6.019268056469171]
Recent findings suggest that ReLU neural networks can be understood geometrically as space folding of the input space.<n>We present the first quantitative analysis of this space phenomenon in ReLU models.
arXiv  Detail & Related papers  (2025-02-14T07:22:24Z) - Latent Point Collapse on a Low Dimensional Embedding in Deep Neural   Network Classifiers [0.0]
We propose a method to induce the collapse of latent representations belonging to the same class into a single point.<n>The proposed approach is straightforward to implement and yields substantial improvements in discnative feature embeddings.
arXiv  Detail & Related papers  (2023-10-12T11:16:57Z) - Binarizing Sparse Convolutional Networks for Efficient Point Cloud
  Analysis [93.55896765176414]
We propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis.
We employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution.
Our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods.
arXiv  Detail & Related papers  (2023-03-27T13:47:06Z) - Regression as Classification: Influence of Task Formulation on Neural
  Network Features [16.239708754973865]
Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss.
 practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance.
By focusing on two-layer ReLU networks, we explore how the implicit bias induced by gradient-based optimization could partly explain the phenomenon.
arXiv  Detail & Related papers  (2022-11-10T15:13:23Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
  Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
 Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv  Detail & Related papers  (2022-10-25T14:45:15Z) - Clustering-Based Interpretation of Deep ReLU Network [17.234442722611803]
We recognize that the non-linear behavior of the ReLU function gives rise to a natural clustering.
We propose a method to increase the level of interpretability of a fully connected feedforward ReLU neural network.
arXiv  Detail & Related papers  (2021-10-13T09:24:11Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
  Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv  Detail & Related papers  (2021-01-12T00:40:45Z) - Implicit Under-Parameterization Inhibits Data-Efficient Deep
  Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network.
We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv  Detail & Related papers  (2020-10-27T17:55:16Z) - Extreme Memorization via Scale of Initialization [72.78162454173803]
We construct an experimental setup in which changing the scale of initialization strongly impacts the implicit regularization induced by SGD.
We find that the extent and manner in which generalization ability is affected depends on the activation and loss function used.
In the case of the homogeneous ReLU activation, we show that this behavior can be attributed to the loss function.
arXiv  Detail & Related papers  (2020-08-31T04:53:11Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
  Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv  Detail & Related papers  (2020-03-19T15:36:31Z) - AL2: Progressive Activation Loss for Learning General Representations in
  Classification Neural Networks [12.14537824884951]
We propose a novel regularization method that progressively penalizes the magnitude of activations during training.
Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
arXiv  Detail & Related papers  (2020-03-07T18:38:46Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.