Generalization and Estimation Error Bounds for Model-based Neural
Networks
- URL: http://arxiv.org/abs/2304.09802v1
- Date: Wed, 19 Apr 2023 16:39:44 GMT
- Title: Generalization and Estimation Error Bounds for Model-based Neural
Networks
- Authors: Avner Shultzman, Eyar Azar, Miguel R. D. Rodrigues, Yonina C. Eldar
- Abstract summary: We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
- Score: 78.88759757988761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based neural networks provide unparalleled performance for various
tasks, such as sparse coding and compressed sensing problems. Due to the strong
connection with the sensing model, these networks are interpretable and inherit
prior structure of the problem. In practice, model-based neural networks
exhibit higher generalization capability compared to ReLU neural networks.
However, this phenomenon was not addressed theoretically. Here, we leverage
complexity measures including the global and local Rademacher complexities, in
order to provide upper bounds on the generalization and estimation errors of
model-based networks. We show that the generalization abilities of model-based
networks for sparse recovery outperform those of regular ReLU networks, and
derive practical design rules that allow to construct model-based networks with
guaranteed high generalization. We demonstrate through a series of experiments
that our theoretical insights shed light on a few behaviours experienced in
practice, including the fact that ISTA and ADMM networks exhibit higher
generalization abilities (especially for small number of training samples),
compared to ReLU networks.
Related papers
- Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize [5.642322814965062]
Learning representations that generalize under distribution shifts is critical for building robust machine learning models.
We show that even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network.
arXiv Detail & Related papers (2024-06-05T15:04:27Z) - A method for quantifying the generalization capabilities of generative models for solving Ising models [5.699467840225041]
We use a Hamming distance regularizer to quantify the generalization capabilities of various network architectures combined with VAN.
We conduct numerical experiments on several network architectures combined with VAN, including feed-forward neural networks, recurrent neural networks, and graph neural networks.
Our method is of great significance for assisting in the Neural Architecture Search field of searching for the optimal network architectures when solving large-scale Ising models.
arXiv Detail & Related papers (2024-05-06T12:58:48Z) - From NeurODEs to AutoencODEs: a mean-field control framework for
width-varying Neural Networks [68.8204255655161]
We propose a new type of continuous-time control system, called AutoencODE, based on a controlled field that drives dynamics.
We show that many architectures can be recovered in regions where the loss function is locally convex.
arXiv Detail & Related papers (2023-07-05T13:26:17Z) - Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Compressive Sensing and Neural Networks from a Statistical Learning
Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements.
Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.