Related papers: The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

URL: http://arxiv.org/abs/2602.07219v1
Date: Fri, 06 Feb 2026 22:00:31 GMT
Title: The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network
Authors: Abhigyan Dutta, Itay Safran, Paul Valiant,
Abstract summary: We study the approximation of the median of $d$ inputs using ReLU neural networks.<n>We present a constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube.
Score: 4.88752466747693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the approximation of the median of $d$ inputs using ReLU neural networks. We present depth-width tradeoffs under several settings, culminating in a constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube. By further establishing a general reduction from the maximum to the median, our results break a barrier suggested by prior work on the maximum function, which indicated that linear width should require depth growing at least as $\log\log d$ to achieve comparable accuracy. Our construction relies on a multi-stage procedure that iteratively eliminates non-central elements while preserving a candidate set around the median. We overcome obstacles that do not arise for the maximum to yield approximation results that are strictly stronger than those previously known for the maximum itself.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.<n>This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.<n>Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm [12.405427902037971]
We propose a first-order sampling method for approximate sampling from a target distribution whose support is a proper convex subset of $mathbbRd$.<n>Our proposed method is the result of applying a Metropolis-Hastings filter to the Markov chain formed by a single step of the preconditioned Langevin algorithm.
arXiv Detail & Related papers (2024-12-24T23:21:23Z)
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods [43.32546195968771]
We study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation.<n>Our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds.<n>We show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution.
arXiv Detail & Related papers (2024-10-13T21:49:29Z)
A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal. Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z)
Optimal Approximation of Zonoids and Uniform Approximation by Shallow Neural Networks [2.7195102129095003]
We study the following two related problems.<n>The first is to determine what error an arbitrary zonoid in $mathbbRd+1$ can be approximated in the Hausdorff distance by a sum of $n$ line segments.<n>The second is to determine optimal approximation rates in the uniform norm for shallow ReLU$k$ neural networks on their variation spaces.
arXiv Detail & Related papers (2023-07-28T03:43:17Z)
How Many Neurons Does it Take to Approximate the Maximum? [10.995895410470279]
We study the size of a neural network needed to approximate the maximum function over $d$ inputs. We provide new lower and upper bounds on the width required for approximation across various depths.
arXiv Detail & Related papers (2023-07-18T12:47:35Z)
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z)
Faster high-accuracy log-concave sampling via algorithmic warm starts [6.117084972237769]
In practice, high-accuracy samplers such as the classical Metropolis-adjusted Langevin algorithm (MALA) remain the de facto gold standard. We improve the dimension dependence of this sampling problem to $tildeO(d1/2)$, whereas the previous best result for MALA was $tildeO(d)$. Our main technical contribution settles this problem by establishing the first $tildeO(d1/2)$ R'enyi mixing rates for the discretized underdamped Langevin diffusion.
arXiv Detail & Related papers (2023-02-20T19:27:21Z)
Nystr\"om Kernel Mean Embeddings [92.10208929236826]
We propose an efficient approximation procedure based on the Nystr"om method. It yields sufficient conditions on the subsample size to obtain the standard $n-1/2$ rate. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules.
arXiv Detail & Related papers (2022-01-31T08:26:06Z)
Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation [154.2195491708548]
We study the prediction discriminability and diversity by studying the structure of the classification output matrix of a randomly selected data batch. We propose Batch Nuclear-norm Maximization and Minimization, which performs nuclear-norm on the target output matrix to enhance the target prediction ability. Experiments show that our method could boost the adaptation accuracy and robustness under three typical domain adaptation scenarios.
arXiv Detail & Related papers (2021-07-13T15:08:32Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.