Related papers: Wormhole Dynamics in Deep Neural Networks

Wormhole Dynamics in Deep Neural Networks

URL: http://arxiv.org/abs/2508.15086v1
Date: Wed, 20 Aug 2025 21:41:53 GMT
Title: Wormhole Dynamics in Deep Neural Networks
Authors: Yen-Lung Lai, Zhe Jin,
Abstract summary: This work investigates the generalization behavior of deep neural networks (DNNs)<n>We focus on the phenomenon of "fooling examples," where DNNs confidently classify inputs that appear random or unstructured to humans.
Score: 7.531126877550286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work investigates the generalization behavior of deep neural networks (DNNs), focusing on the phenomenon of "fooling examples," where DNNs confidently classify inputs that appear random or unstructured to humans. To explore this phenomenon, we introduce an analytical framework based on maximum likelihood estimation, without adhering to conventional numerical approaches that rely on gradient-based optimization and explicit labels. Our analysis reveals that DNNs operating in an overparameterized regime exhibit a collapse in the output feature space. While this collapse improves network generalization, adding more layers eventually leads to a state of degeneracy, where the model learns trivial solutions by mapping distinct inputs to the same output, resulting in zero loss. Further investigation demonstrates that this degeneracy can be bypassed using our newly derived "wormhole" solution. The wormhole solution, when applied to arbitrary fooling examples, reconciles meaningful labels with random ones and provides a novel perspective on shortcut learning. These findings offer deeper insights into DNN generalization and highlight directions for future research on learning dynamics in unsupervised settings to bridge the gap between theory and practice.

Related papers

Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks [7.721672385781673]
We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer.<n>We apply Random Matrix Theory to derive a closed-form expression for the generalization error.<n>We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance.
arXiv Detail & Related papers (2025-11-04T09:30:31Z)
Information-Theoretic Generalization Bounds for Deep Neural Networks [20.015357820733406]
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications.<n>This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds.
arXiv Detail & Related papers (2024-04-04T03:20:35Z)
Graph Out-of-Distribution Generalization via Causal Intervention [69.70137479660113]
We introduce a conceptually simple yet principled approach for training robust graph neural networks (GNNs) under node-level distribution shifts. Our method resorts to a new learning objective derived from causal inference that coordinates an environment estimator and a mixture-of-expert GNN predictor. Our model can effectively enhance generalization with various types of distribution shifts and yield up to 27.4% accuracy improvement over state-of-the-arts on graph OOD generalization benchmarks.
arXiv Detail & Related papers (2024-02-18T07:49:22Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Negative Flux Aggregation to Estimate Feature Attributions [15.411534490483495]
There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. To enhance the explainability of DNNs, we estimate the input feature's attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map.
arXiv Detail & Related papers (2023-01-17T16:19:41Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Neuron Coverage-Guided Domain Generalization [37.77033512313927]
This paper focuses on the domain generalization task where domain knowledge is unavailable, and even worse, only samples from a single domain can be utilized during training. Our motivation originates from the recent progresses in deep neural network (DNN) testing, which has shown that maximizing neuron coverage of DNN can help to explore possible defects of DNN.
arXiv Detail & Related papers (2021-02-27T14:26:53Z)
Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice. We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.