Agnostic Active Learning of Single Index Models with Linear Sample Complexity
- URL: http://arxiv.org/abs/2405.09312v3
- Date: Tue, 9 Jul 2024 19:20:57 GMT
- Title: Agnostic Active Learning of Single Index Models with Linear Sample Complexity
- Authors: Aarshvi Gajjar, Wai Ming Tai, Xingyu Xu, Chinmay Hegde, Yi Li, Christopher Musco,
- Abstract summary: We study active learning methods for single index models of the form $F(mathbf x) = f(langle mathbf w, mathbf xrangle)$.
- Score: 27.065175036001246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d^{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$.
We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs
with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features.
We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.