Robust Generalization of Quadratic Neural Networks via Function
Identification
- URL: http://arxiv.org/abs/2109.10935v1
- Date: Wed, 22 Sep 2021 18:02:00 GMT
- Title: Robust Generalization of Quadratic Neural Networks via Function
Identification
- Authors: Kan Xu, Hamsa Bastani, Osbert Bastani
- Abstract summary: Generalization bounds from learning theory often assume that the test distribution is close to the training distribution.
We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
- Score: 19.87036824512198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge facing deep learning is that neural networks are often not
robust to shifts in the underlying data distribution. We study this problem
from the perspective of the statistical concept of parameter identification.
Generalization bounds from learning theory often assume that the test
distribution is close to the training distribution. In contrast, if we can
identify the "true" parameters, then the model generalizes to arbitrary
distribution shifts. However, neural networks are typically overparameterized,
making parameter identification impossible. We show that for quadratic neural
networks, we can identify the function represented by the model even though we
cannot identify its parameters. Thus, we can obtain robust generalization
bounds even in the overparameterized setting. We leverage this result to obtain
new bounds for contextual bandits and transfer learning with quadratic neural
networks. Overall, our results suggest that we can improve robustness of neural
networks by designing models that can represent the true data generating
process. In practice, the true data generating process is often very complex;
thus, we study how our framework might connect to neural module networks, which
are designed to break down complex tasks into compositions of simpler ones. We
prove robust generalization bounds when individual neural modules are
identifiable.
Related papers
- Residual Random Neural Networks [0.0]
Single-layer feedforward neural network with random weights is a recurring motif in the neural networks literature.
We show that one can obtain good classification results even if the number of hidden neurons has the same order of magnitude as the dimensionality of the data samples.
arXiv Detail & Related papers (2024-10-25T22:00:11Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Correlation between entropy and generalizability in a neural network [9.223853439465582]
We use Wang-Landau Mote Carlo algorithm to calculate the entropy at a given test accuracy.
Our results show that entropical forces help generalizability.
arXiv Detail & Related papers (2022-07-05T12:28:13Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - Persistent Homology Captures the Generalization of Neural Networks
Without A Validation Set [0.0]
We suggest studying the training of neural networks with Algebraic Topology, specifically Persistent Homology.
Using simplicial complex representations of neural networks, we study the PH diagram distance evolution on the neural network learning process.
Results show that the PH diagram distance between consecutive neural network states correlates with the validation accuracy.
arXiv Detail & Related papers (2021-05-31T09:17:31Z) - Malicious Network Traffic Detection via Deep Learning: An Information
Theoretic View [0.0]
We study how homeomorphism affects learned representation of a malware traffic dataset.
Our results suggest that although the details of learned representations and the specific coordinate system defined over the manifold of all parameters differ slightly, the functional approximations are the same.
arXiv Detail & Related papers (2020-09-16T15:37:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.