Representation and Regression Problems in Neural Networks: Relaxation, Generalization, and Numerics
- URL: http://arxiv.org/abs/2412.01619v1
- Date: Mon, 02 Dec 2024 15:40:29 GMT
- Title: Representation and Regression Problems in Neural Networks: Relaxation, Generalization, and Numerics
- Authors: Kang Liu, Enrique Zuazua,
- Abstract summary: We address three non-dimensional optimization problems associated with training shallow neural networks (NNs)
We convexify these problems and representation, applying a representer gradient to prove the absence relaxation gaps.
We analyze the impact of key parameters on these bounds, propose optimal choices.
For high-dimensional datasets, we propose a sparsification algorithm that, combined with gradient descent, yields effective solutions.
- Score: 5.915970073098098
- License:
- Abstract: In this work, we address three non-convex optimization problems associated with the training of shallow neural networks (NNs) for exact and approximate representation, as well as for regression tasks. Through a mean-field approach, we convexify these problems and, applying a representer theorem, prove the absence of relaxation gaps. We establish generalization bounds for the resulting NN solutions, assessing their predictive performance on test datasets and, analyzing the impact of key hyperparameters on these bounds, propose optimal choices. On the computational side, we examine the discretization of the convexified problems and derive convergence rates. For low-dimensional datasets, these discretized problems are efficiently solvable using the simplex method. For high-dimensional datasets, we propose a sparsification algorithm that, combined with gradient descent for over-parameterized shallow NNs, yields effective solutions to the primal problems.
Related papers
- Towards graph neural networks for provably solving convex optimization problems [5.966097889241178]
We propose an iterative MPNN framework to solve convex optimization problems with provable feasibility guarantees.
Experimental results show that our approach outperforms existing neural baselines in solution quality and feasibility.
arXiv Detail & Related papers (2025-02-04T16:11:41Z) - On improving generalization in a class of learning problems with the method of small parameters for weakly-controlled optimal gradient systems [0.0]
We consider a variational problem for a weakly-controlled gradient system, whose control input enters into the system dynamics as a coefficient to a nonlinear term.
Using the perturbation theory, we provide results that will allow us to solve a sequence of optimization problems.
We also provide an estimate for the rate of convergence for such approximate optimal solutions.
arXiv Detail & Related papers (2024-12-11T20:50:29Z) - Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - A Primal-dual algorithm for image reconstruction with ICNNs [3.4797100095791706]
We address the optimization problem in a data-driven variational framework, where the regularizer is parameterized by an input- neural network (ICNN)
While gradient-based methods are commonly used to solve such problems, they struggle to effectively handle nonsmoothness.
We show that a proposed approach outperforms subgradient methods in terms of both speed and stability.
arXiv Detail & Related papers (2024-10-16T10:36:29Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - NTopo: Mesh-free Topology Optimization using Implicit Neural
Representations [35.07884509198916]
We present a novel machine learning approach to tackle topology optimization problems.
We use multilayer perceptrons (MLPs) to parameterize both density and displacement fields.
As we show through our experiments, a major benefit of our approach is that it enables self-supervised learning of continuous solution spaces.
arXiv Detail & Related papers (2021-02-22T05:25:22Z) - Consistency analysis of bilevel data-driven learning in inverse problems [1.0705399532413618]
We consider the adaptive learning of the regularization parameter from data by means of optimization.
We demonstrate how to implement our framework on linear inverse problems.
Online numerical schemes are derived using the gradient descent method.
arXiv Detail & Related papers (2020-07-06T12:23:29Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.