Algebraic Approach to Ridge-Regularized Mean Squared Error Minimization in Minimal ReLU Neural Network
- URL: http://arxiv.org/abs/2508.17783v1
- Date: Mon, 25 Aug 2025 08:24:20 GMT
- Title: Algebraic Approach to Ridge-Regularized Mean Squared Error Minimization in Minimal ReLU Neural Network
- Authors: Ryoya Fukasaku, Yutaro Kabata, Akifumi Okuno,
- Abstract summary: We develop a Divide-Enumerate-Merge strategy that enumerates all local minima of the RR-MSE.<n>Although computational algebraic methods are computationally very intensive for perceptrons of practical size, as a proof of concept, we apply the proposed approach in practice to minimal perceptrons with a few hidden units.
- Score: 0.509780930114934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates a perceptron, a simple neural network model, with ReLU activation and a ridge-regularized mean squared error (RR-MSE). Our approach leverages the fact that the RR-MSE for ReLU perceptron is piecewise polynomial, enabling a systematic analysis using tools from computational algebra. In particular, we develop a Divide-Enumerate-Merge strategy that exhaustively enumerates all local minima of the RR-MSE. By virtue of the algebraic formulation, our approach can identify not only the typical zero-dimensional minima (i.e., isolated points) obtained by numerical optimization, but also higher-dimensional minima (i.e., connected sets such as curves, surfaces, or hypersurfaces). Although computational algebraic methods are computationally very intensive for perceptrons of practical size, as a proof of concept, we apply the proposed approach in practice to minimal perceptrons with a few hidden units.
Related papers
- Deep Unfolding for MIMO Signal Detection [1.6881346757176976]
We propose a deep unfolding neural network-based MIMO detector that incorporates complex-valued computations using Wirtinger calculus.<n>The proposed algorithm requires only a small number of trainable parameters, allowing for simplified training.<n> Numerical results demonstrate that the proposed method achieves superior detection performance with fewer iterations and lower computational complexity.
arXiv Detail & Related papers (2025-07-24T00:48:04Z) - Restarted contractive operators to learn at equilibrium [0.0]
We introduce an algorithm that combines a restart strategy with JFB computed by AD and we show that the learned steps can be made arbitrarily close to the optimal DEQ framework.<n>We show that this method is effective for training weights in weighted norms; stepsizes and regularization levels of Plug-and-Play schemes; and a DRUNet denoiser embedded in Forward-Backward iterates.
arXiv Detail & Related papers (2025-06-16T08:38:56Z) - Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with $f$-Divergence Perception Constraints [10.564071872770146]
We study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources.
We characterize the optimal parametric solutions.
We provide sufficient conditions on the distortion and the perception constraints.
arXiv Detail & Related papers (2024-08-27T12:50:12Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - The Dynamics of Riemannian Robbins-Monro Algorithms [101.29301565229265]
We propose a family of Riemannian algorithms generalizing and extending the seminal approximation framework of Robbins and Monro.
Compared to their Euclidean counterparts, Riemannian algorithms are much less understood due to lack of a global linear structure on the manifold.
We provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes.
arXiv Detail & Related papers (2022-06-14T12:30:11Z) - Generalized Orthogonal Procrustes Problem under Arbitrary Adversaries [1.0152838128195467]
We study the semidefinite relaxation (SDR) and an iterative method named generalized power method (GPM) to find the least squares estimator.<n>In addition, we analyze the low-rank factorization algorithm and show the corresponding optimization landscape is free of spurious local minimizers.
arXiv Detail & Related papers (2021-06-29T15:19:25Z) - Solving PDEs on Unknown Manifolds with Machine Learning [8.220217498103315]
This paper presents a mesh-free computational framework and machine learning theory for solving elliptic PDEs on unknown manifold.
We show that the proposed NN solver can robustly generalize the PDE on new data points with errors that are almost identical to generalizations on new data points.
arXiv Detail & Related papers (2021-06-12T03:55:15Z) - Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector
Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously.
We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand.
We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice.
Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.