Related papers: Expressive Power of Deep Networks on Manifolds: Simultaneous Approximation

Expressive Power of Deep Networks on Manifolds: Simultaneous Approximation

URL: http://arxiv.org/abs/2509.09362v3
Date: Thu, 25 Sep 2025 04:40:47 GMT
Title: Expressive Power of Deep Networks on Manifolds: Simultaneous Approximation
Authors: Hanfei Zhou, Lei Shi,
Abstract summary: We show that a constant-depth $mathrmReLUk-1$ network with bounded weights can approximate any function in the Sobolev space.<n>We also prove that our construction is nearly optimal by showing the required number of parameters matches up to a logarithmic factor.
Score: 2.815765641180636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key challenge in scientific machine learning is solving partial differential equations (PDEs) on complex domains, where the curved geometry complicates the approximation of functions and their derivatives required by differential operators. This paper establishes the first simultaneous approximation theory for deep neural networks on manifolds. We prove that a constant-depth $\mathrm{ReLU}^{k-1}$ network with bounded weights--a property that plays a crucial role in controlling generalization error--can approximate any function in the Sobolev space $\mathcal{W}_p^{k}(\mathcal{M}^d)$ to an error of $\varepsilon$ in the $\mathcal{W}_p^{s}(\mathcal{M}^d)$ norm, for $k\geq 3$ and $s<k$, using $\mathcal{O}(\varepsilon^{-d/(k-s)})$ nonzero parameters, a rate that overcomes the curse of dimensionality by depending only on the intrinsic dimension $d$. These results readily extend to functions in H\"older-Zygmund spaces. We complement this result with a matching lower bound, proving our construction is nearly optimal by showing the required number of parameters matches up to a logarithmic factor. Our proof of the lower bound introduces novel estimates for the Vapnik-Chervonenkis dimension and pseudo-dimension of the network's high-order derivative classes. These complexity bounds provide a theoretical cornerstone for learning PDEs on manifolds involving derivatives. Our analysis reveals that the network architecture leverages a sparse structure to efficiently exploit the manifold's low-dimensional geometry. Finally, we corroborate our theoretical findings with numerical experiments.

Related papers

On finite-dimensional encoding/decoding theorems for neural operators [0.0]
We show that a continuous mapping $f$ between function spaces $E$ and $F$ is approximated in the topology of uniform convergence on compacta.<n>We point out that the result needs no assumptions on $E,F$ whatsoever and remains true not only for all normed spaces, but for arbitrary locally convex spaces as well.<n>This analysis may be useful already because non-normable locally convex function spaces are common in the theory of differential equations.
arXiv Detail & Related papers (2026-01-20T15:15:51Z)
Quantum Circuit Encodings of Polynomial Chaos Expansions [5.63729124086755]
We study countably-parametric holomorphic maps $u:Uto mathbbR$, where the parameter domain is $U=[-1,1]mathbbN$.<n>We establish dimension-independent quantum circuit approximation rates via the best $n$-term truncations of generalized chaos (gPC) expansions of these parametric maps.<n>Our results have implications for quantum-enhanced algorithms for a wide range of maps in applications.
arXiv Detail & Related papers (2025-06-02T15:53:36Z)
A Quasilinear Algorithm for Computing Higher-Order Derivatives of Deep Feed-Forward Neural Networks [0.0]
$n$-TangentProp computes the exact derivative $dn/dxn f(x)$ in quasilinear, instead of exponential time.<n>We demonstrate that our method is particularly beneficial in the context of physics-informed neural networks.
arXiv Detail & Related papers (2024-12-12T22:57:28Z)
Neural Networks and (Virtual) Extended Formulations [5.762677915745415]
We prove lower bounds on the size of neural networks that optimize over $P$.<n>We show that $mathrmxc(P)$ is a lower bound on the size of any monotone or input neural network that solves the linear optimization problem over $P$.
arXiv Detail & Related papers (2024-11-05T11:12:11Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures. This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z)
Neural Network Approximation of Continuous Functions in High Dimensions with Applications to Inverse Problems [6.84380898679299]
Current theory predicts that networks should scale exponentially in the dimension of the problem. We provide a general method for bounding the complexity required for a neural network to approximate a H"older (or uniformly) continuous function.
arXiv Detail & Related papers (2022-08-28T22:44:07Z)
Solving parametric partial differential equations with deep rectified quadratic unit neural networks [38.16617079681564]
In this study, we investigate the expressive power of deep rectified quadratic unit (ReQU) neural networks for approximating the solution maps of parametric PDEs. We derive an upper bound $mathcalOleft(d3log_2qlog_2 (1/ epsilon) right)$ on the size of the deep ReQU neural network required to achieve accuracy.
arXiv Detail & Related papers (2022-03-14T10:15:29Z)
Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations [0.0]
We construct deep operator networks (ONets) between infinite-dimensional spaces that emulate with an exponential rate of convergence the coefficient-to-solution map of elliptic second-order PDEs. In particular, we consider problems set in $d$-dimensional periodic domains, $d=1, 2, dots$, and with analytic right-hand sides and coefficients. We prove that the neural networks in the ONet have size $mathcalO(left|log(varepsilon)right|kappa)$ for some $kappa
arXiv Detail & Related papers (2021-12-15T13:56:28Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
A deep network construction that adapts to intrinsic dimensionality beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation. We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.