Related papers: Path-metrics, pruning, and generalization

Related papers

Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Metric Convolutions: A Unifying Theory to Adaptive Convolutions [3.481985817302898]
Metric convolutions replace standard convolutions in image processing and deep learning. They require fewer parameters and provide better generalisation. Our approach shows competitive performance in standard denoising and classification tasks.
arXiv Detail & Related papers (2024-06-08T08:41:12Z)
Three Quantization Regimes for ReLU Networks [3.823356975862005]
We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. In the proper-quantization regime, neural networks exhibit memory-optimality in the approximation of Lipschitz functions.
arXiv Detail & Related papers (2024-05-03T09:27:31Z)
Hidden Synergy: $L_1$ Weight Normalization and 1-Path-Norm Regularization [0.0]
We show how PSiLON Net's design drastically simplifies the 1-path-norm. We propose a pruning method to achieve exact sparsity in the final stages of training.
arXiv Detail & Related papers (2024-04-29T21:25:25Z)
Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z)
Geometry-induced Implicit Regularization in Deep ReLU Neural Networks [0.0]
Implicit regularization phenomena, which are still not well understood, occur during optimization. We study the geometry of the output set as parameters vary. We prove that the batch functional dimension is almost surely determined by the activation patterns in the hidden layers.
arXiv Detail & Related papers (2024-02-13T07:49:57Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Adaptive Log-Euclidean Metrics for SPD Matrix Learning [73.12655932115881]
We propose Adaptive Log-Euclidean Metrics (ALEMs), which extend the widely used Log-Euclidean Metric (LEM) The experimental and theoretical results demonstrate the merit of the proposed metrics in improving the performance of SPD neural networks.
arXiv Detail & Related papers (2023-03-26T18:31:52Z)
A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks. We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z)
A Lifted Bregman Formulation for the Inversion of Deep Neural Networks [28.03724379169264]
We propose a novel framework for the regularised inversion of deep neural networks. The framework lifts the parameter space into a higher dimensional space by introducing auxiliary variables. We present theoretical results and support their practical application with numerical examples.
arXiv Detail & Related papers (2023-03-01T20:30:22Z)
Direct Parameterization of Lipschitz-Bounded Deep Networks [3.883460584034766]
This paper introduces a new parameterization of deep neural networks (both fully-connected and convolutional) with guaranteed $ell2$ Lipschitz bounds. The Lipschitz guarantees are equivalent to the tightest-known bounds based on certification via a semidefinite program (SDP) We provide a direct'' parameterization, i.e., a smooth mapping from $mathbb RN$ onto the set of weights satisfying the SDP-based bound.
arXiv Detail & Related papers (2023-01-27T04:06:31Z)
Simple initialization and parametrization of sinusoidal networks via their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions. We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis. We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Rethinking Lipschitz Neural Networks for Certified L-infinity Robustness [33.72713778392896]
We study certified $ell_infty$ from a novel perspective of representing Boolean functions. We develop a unified Lipschitz network that generalizes prior works, and design a practical version that can be efficiently trained.
arXiv Detail & Related papers (2022-10-04T17:55:27Z)
Provably tuning the ElasticNet across instances [53.0518090093538]
We consider the problem of tuning the regularization parameters of Ridge regression, LASSO, and the ElasticNet across multiple problem instances. Our results are the first general learning-theoretic guarantees for this important class of problems.
arXiv Detail & Related papers (2022-07-20T21:22:40Z)
Approximation speed of quantized vs. unquantized ReLU neural networks and beyond [0.0]
We consider general approximation families encompassing ReLU neural networks. We use $infty$-encodability to guarantee that ReLU networks can be uniformly quantized. We also prove that ReLU networks share a common limitation with many other approximation families.
arXiv Detail & Related papers (2022-05-24T07:48:12Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks [77.82638674792292]
Lipschitz constants of neural networks allow for guarantees of robustness in image classification, safety in controller design, and generalizability beyond the training data. As calculating Lipschitz constants is NP-hard, techniques for estimating Lipschitz constants must navigate the trade-off between scalability and accuracy. In this work, we significantly push the scalability frontier of a semidefinite programming technique known as LipSDP while achieving zero accuracy loss.
arXiv Detail & Related papers (2022-04-02T11:57:52Z)
The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees. We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z)
Global convergence of ResNets: From finite to infinite width using linear parameterization [0.0]
We study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear. In this limit, we prove a local Polyak-Lojasiewicz inequality, retrieving the lazy regime. Our analysis leads to a practical and quantified recipe.
arXiv Detail & Related papers (2021-12-10T13:38:08Z)
Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds [99.23098204458336]
Certified robustness is a desirable property for deep neural networks in safety-critical applications. We show that our method consistently outperforms state-of-the-art methods on MNIST and TinyNet datasets.
arXiv Detail & Related papers (2021-11-02T06:44:10Z)
A Pairwise Connected Tensor Network Representation of Path Integrals [0.0]
It has been recently shown how the tensorial nature of real-time path integrals involving the Feynman-Vernon influence functional can be utilized. Here, a generalized tensor network is derived and implemented specifically incorporating the pairwise interaction structure of the influence functional. This pairwise connected tensor network path integral (PCTNPI) is illustrated through applications to typical spin-boson problems and explorations of the differences caused by the exact form of the spectral density.
arXiv Detail & Related papers (2021-06-28T18:30:17Z)
LipBaB: Computing exact Lipschitz constant of ReLU networks [0.0]
LipBaB is a framework to compute certified bounds of the local Lipschitz constant of deep neural networks. Our algorithm can provide provably exact computation of the Lipschitz constant for any p-norm.
arXiv Detail & Related papers (2021-05-12T08:06:11Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Lipschitz Bounded Equilibrium Networks [3.2872586139884623]
This paper introduces new parameterizations of equilibrium neural networks, i.e. networks defined by implicit equations. The new parameterization admits a Lipschitz bound during training via unconstrained optimization. In image classification experiments we show that the Lipschitz bounds are very accurate and improve robustness to adversarial attacks.
arXiv Detail & Related papers (2020-10-05T01:00:40Z)
Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z)
On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory [77.18089185140767]
Lipschitz regularity is established as a key property of modern deep learning. computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard. We introduce a new upper bound for convolutional layers that is both tight and easy to compute.
arXiv Detail & Related papers (2020-06-15T13:23:34Z)
Deep connections between learning from limited labels & physical parameter estimation -- inspiration for regularization [0.0]
We show that explicit regularization of model parameters in PDE constrained optimization translates to regularization of the network output. A hyperspectral imaging example shows that minimum prior information together with cross-validation for optimal regularization parameters boosts the segmentation accuracy.
arXiv Detail & Related papers (2020-03-17T19:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.