Related papers: Understanding Nonlinear Implicit Bias via Region Counts in Input Space

Understanding Nonlinear Implicit Bias via Region Counts in Input Space

URL: http://arxiv.org/abs/2505.11370v2
Date: Sat, 07 Jun 2025 08:17:19 GMT
Title: Understanding Nonlinear Implicit Bias via Region Counts in Input Space
Authors: Jingwei Li, Jing Xu, Zifan Wang, Huishuai Zhang, Jingzhao Zhang,
Abstract summary: We characterize implicit bias by the count of connected regions in the input space with the same predicted label.<n>We find that small region counts align with geometrically simple decision boundaries and correlate well with good generalization performance.
Score: 33.269290703951455
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One explanation for the strong generalization ability of neural networks is implicit bias. Yet, the definition and mechanism of implicit bias in non-linear contexts remains little understood. In this work, we propose to characterize implicit bias by the count of connected regions in the input space with the same predicted label. Compared with parameter-dependent metrics (e.g., norm or normalized margin), region count can be better adapted to nonlinear, overparameterized models, because it is determined by the function mapping and is invariant to reparametrization. Empirically, we found that small region counts align with geometrically simple decision boundaries and correlate well with good generalization performance. We also observe that good hyper-parameter choices such as larger learning rates and smaller batch sizes can induce small region counts. We further establish the theoretical connections and explain how larger learning rate can induce small region counts in neural networks.

Related papers

Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization [35.5848464226014]
We study the invariance of neural nets under reparametrization from the perspective of Riemannian geometry. We discuss implications for measuring the flatness of minima, optimization, and for probability-density.
arXiv Detail & Related papers (2023-02-14T22:48:24Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation [79.13041340708395]
Lipschitz constants are connected to many properties of neural networks, such as robustness, fairness, and generalization. Existing methods for computing Lipschitz constants either produce relatively loose upper bounds or are limited to small networks. We develop an efficient framework for computing the $ell_infty$ local Lipschitz constant of a neural network by tightly upper bounding the norm of Clarke Jacobian.
arXiv Detail & Related papers (2022-10-13T22:23:22Z)
On generalization bounds for deep networks based on loss surface implicit regularization [5.68558935178946]
Modern deep neural networks generalize well despite a large number of parameters. That modern deep neural networks generalize well despite a large number of parameters contradicts the classical statistical learning theory.
arXiv Detail & Related papers (2022-01-12T16:41:34Z)
Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN) In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation. We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z)
Nonperturbative renormalization for the neural network-QFT correspondence [0.0]
We study the concepts of locality and power-counting in this context. We provide an analysis in terms of the nonperturbative renormalization group using the Wetterich-Morris equation. Our aim is to provide a useful formalism to investigate neural networks behavior beyond the large-width limit.
arXiv Detail & Related papers (2021-08-03T10:36:04Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
On the Number of Linear Functions Composing Deep Neural Network: Towards a Refined Definition of Neural Networks Complexity [6.252236971703546]
We introduce an equivalence relation among the linear functions composing a piecewise linear function and then count those linear functions relative to that equivalence relation. Our new complexity measure can clearly distinguish between the two models, is consistent with the classical measure, and increases exponentially with depth.
arXiv Detail & Related papers (2020-10-23T01:46:12Z)
Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters. We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.