Related papers: Beyond Universal Approximation Theorems: Algorithmic Uniform Approximation by Neural Networks Trained with Noisy Data

Beyond Universal Approximation Theorems: Algorithmic Uniform Approximation by Neural Networks Trained with Noisy Data

URL: http://arxiv.org/abs/2509.00924v1
Date: Sun, 31 Aug 2025 16:20:27 GMT
Title: Beyond Universal Approximation Theorems: Algorithmic Uniform Approximation by Neural Networks Trained with Noisy Data
Authors: Anastasis Kratsios, Tin Sum Cheng, Daniel Roy,
Abstract summary: This paper introduces an architecture-specific randomized training algorithm that constructs a uniform approximator from $N$ noisy training samples.<n>Our trained neural networks attain the minimax-optimal quantity of textittrainable (non-random) parameters, subject to logarithmic factors which vanish under the idealized noiseless sampling assumed in classical UATs.
Score: 12.815892583089443
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: At its core, machine learning seeks to train models that reliably generalize beyond noisy observations; however, the theoretical vacuum in which state-of-the-art universal approximation theorems (UATs) operate isolates them from this goal, as they assume noiseless data and allow network parameters to be chosen freely, independent of algorithmic realism. This paper bridges that gap by introducing an architecture-specific randomized training algorithm that constructs a uniform approximator from $N$ noisy training samples on the $d$-dimensional cube $[0,1]^d$. Our trained neural networks attain the minimax-optimal quantity of \textit{trainable} (non-random) parameters, subject to logarithmic factors which vanish under the idealized noiseless sampling assumed in classical UATs. Additionally, our trained models replicate key behaviours of real-world neural networks, absent in standard UAT constructions, by: (1) exhibiting sub-linear parametric complexity when fine-tuning on structurally related and favourable out-of-distribution tasks, (2) exactly interpolating the training data, and (3) maintaining reasonable Lipschitz regularity (after the initial clustering attention layer). These properties bring state-of-the-art UATs closer to practical machine learning, shifting the central open question from algorithmic implementability with noisy samples to whether stochastic gradient descent can achieve comparable guarantees.

Related papers

Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning [52.26396748560348]
We provide an overview of high dimensional dynamical systems driven by random matrices.<n>We focus on applications to simple models of learning and generalization in machine learning theory.
arXiv Detail & Related papers (2026-01-03T00:12:32Z)
Regularized Random Fourier Features and Finite Element Reconstruction for Operator Learning in Sobolev Space [0.7734726150561088]
Kernel-based operator learning can offer accurate, theoretically justified approximations that require less training than standard methods.<n>We propose a regularized random Fourier feature (RRFF) approach, coupled with a finite element reconstruction map (RRFF-FEM)<n>We show that RRFF and RRFF-FEM are robust to noise and achieve improved performance with reduced training time compared to the unregularized random feature model.
arXiv Detail & Related papers (2025-12-19T18:36:24Z)
Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z)
Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.<n> diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.<n>We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network. We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection. We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z)
A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training [16.83036203524611]
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training. It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods. Experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types.
arXiv Detail & Related papers (2021-03-12T22:18:34Z)
Multi-Sample Online Learning for Spiking Neural Networks based on Generalized Expectation Maximization [42.125394498649015]
Spiking Neural Networks (SNNs) capture some of the efficiency of biological brains by processing through binary neural dynamic activations. This paper proposes to leverage multiple compartments that sample independent spiking signals while sharing synaptic weights. The key idea is to use these signals to obtain more accurate statistical estimates of the log-likelihood training criterion, as well as of its gradient.
arXiv Detail & Related papers (2021-02-05T16:39:42Z)
Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data [48.4779912667317]
Self-training algorithms have been very successful for learning with unlabeled data using neural networks. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning.
arXiv Detail & Related papers (2020-10-07T19:43:55Z)
Relative gradient optimization of the Jacobian term in unsupervised deep learning [9.385902422987677]
Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian. We propose a new approach for exact training of such neural networks.
arXiv Detail & Related papers (2020-06-26T16:41:08Z)
The Gaussian equivalence of generative models for learning with shallow neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.