A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks
- URL: http://arxiv.org/abs/2411.02767v1
- Date: Tue, 05 Nov 2024 03:24:34 GMT
- Title: A Convex Relaxation Approach to Generalization Analysis for Parallel Positively Homogeneous Networks
- Authors: Uday Kiran Reddy Tadipatri, Benjamin D. Haeffele, Joshua Agterberg, René Vidal,
- Abstract summary: A class of neural networks whose input-output map is the sum of homogeneous maps are studied.
We propose a general framework for linear bounds for such networks.
- Score: 35.85188662524247
- License:
- Abstract: We propose a general framework for deriving generalization bounds for parallel positively homogeneous neural networks--a class of neural networks whose input-output map decomposes as the sum of positively homogeneous maps. Examples of such networks include matrix factorization and sensing, single-layer multi-head attention mechanisms, tensor factorization, deep linear and ReLU networks, and more. Our general framework is based on linking the non-convex empirical risk minimization (ERM) problem to a closely related convex optimization problem over prediction functions, which provides a global, achievable lower-bound to the ERM problem. We exploit this convex lower-bound to perform generalization analysis in the convex space while controlling the discrepancy between the convex model and its non-convex counterpart. We apply our general framework to a wide variety of models ranging from low-rank matrix sensing, to structured matrix sensing, two-layer linear networks, two-layer ReLU networks, and single-layer multi-head attention mechanisms, achieving generalization bounds with a sample complexity that scales almost linearly with the network width.
Related papers
- Exploring the loss landscape of regularized neural networks via convex duality [42.48510370193192]
We discuss several aspects of the loss landscape of regularized neural networks.
We first characterize the solution set of the convex problem using its dual and further characterize all stationary points.
We show that the solution set characterization and connectivity results can be extended to different architectures.
arXiv Detail & Related papers (2024-11-12T11:41:38Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks [26.58848653965855]
We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations.
We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally.
arXiv Detail & Related papers (2022-10-07T21:14:09Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Decentralized Feature-Distributed Optimization for Generalized Linear
Models [19.800898945436384]
We consider the "all-for-one" decentralized learning problem for generalized linear models.
The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables.
We apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem.
arXiv Detail & Related papers (2021-10-28T16:42:47Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Eliminating Multicollinearity Issues in Neural Network Ensembles:
Incremental, Negatively Correlated, Optimal Convex Blending [0.2294014185517203]
We introduce an incremental algorithm that constructs an aggregate regressor, using an ensemble of neural networks.
We optimally blend the aggregate regressor with a newly trained neural network under a convexity constraint.
Under this framework, collinearity issues do not arise at all, rendering so the method both accurate and robust.
arXiv Detail & Related papers (2021-04-30T01:32:08Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.