An Algorithm for Learning Smaller Representations of Models With Scarce
Data
- URL: http://arxiv.org/abs/2010.07990v1
- Date: Thu, 15 Oct 2020 19:17:51 GMT
- Title: An Algorithm for Learning Smaller Representations of Models With Scarce
Data
- Authors: Adrian de Wynter
- Abstract summary: We present a greedy algorithm for solving binary classification problems in situations where the dataset is too small or not fully representative.
It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a greedy algorithm for solving binary classification problems in
situations where the dataset is either too small or not fully representative of
the problem being solved, and obtaining more data is not possible. This
algorithm is of particular interest when training small models that have
trouble generalizing. It relies on a trained model with loose accuracy
constraints, an iterative hyperparameter pruning procedure, and a function used
to generate new data. Analysis on correctness and runtime complexity under
ideal conditions and an extension to deep neural networks is provided. In the
former case we obtain an asymptotic bound of
$O\left(|\Theta^2|\left(\log{|\Theta|} + |\theta^2| + T_f\left(|
D|\right)\right) + \bar{S}|\Theta||{E}|\right)$, where $|{\Theta}|$ is the
cardinality of the set of hyperparameters $\theta$ to be searched; $|{E}|$ and
$|{D}|$ are the sizes of the evaluation and training datasets, respectively;
$\bar{S}$ and $\bar{f}$ are the inference times for the trained model and the
candidate model; and $T_f({|{D}|})$ is a polynomial on $|{D}|$ and $\bar{f}$.
Under these conditions, this algorithm returns a solution that is $1 \leq r
\leq 2(1 - {2^{-|{\Theta}|}})$ times better than simply enumerating and
training with any $\theta \in \Theta$. As part of our analysis of the
generating function we also prove that, under certain assumptions, if an open
cover of $D$ has the same homology as the manifold where the support of the
underlying probability distribution lies, then $D$ is learnable, and viceversa.
Related papers
- On Agnostic PAC Learning in the Small Error Regime [4.422219522591412]
Empirical Risk Minimization learners are suboptimal in the realizable case yet optimal in the agnostic case.
Work of Hanneke, Larsen, and Zhivotovskiy addresses this shortcoming by including $tau$ as a parameter in the error term.
We show that our learner achieves tightness of the error lower bound $tau + Omega left(sqrtfractau))m + fracd + log (1 / delta)m right)$ for a constant $c leq 2.1$
arXiv Detail & Related papers (2025-02-13T17:03:03Z) - Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory [0.0]
We find a basis $mathcalB$ that describes $xi_rm lin(r)$ near the baryon acoustic oscillation feature in a wide class of cosmological models.
Using our basis functions in model-agnostic BAO analyses can potentially lead to significant statistical gains.
arXiv Detail & Related papers (2024-10-28T18:00:01Z) - Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.<n>We propose a new paradigm that integrates both paired and unpaired data.<n>We show that our approach can theoretically recover true conditional distributions with arbitrarily small error.
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Data Debugging is NP-hard for Classifiers Trained with SGD [8.449687092741604]
We investigate the computational complexity of the problem named Debuggable.
If the loss function and the dimension of the model are not fixed, Debuggable is NP-complete regardless of the training order.
arXiv Detail & Related papers (2024-08-02T16:17:59Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic [99.3682210827572]
Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets.
Data curation strategies are typically developed agnostic of the available compute for training.
We introduce neural scaling laws that account for the non-homogeneous nature of web data.
arXiv Detail & Related papers (2024-04-10T17:27:54Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models.
We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z) - Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture
Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model.
We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z) - Robust Testing in High-Dimensional Sparse Models [0.0]
We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models.
We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Omegaleft(min(slog d,1/gamma4)right) samples.
arXiv Detail & Related papers (2022-05-16T07:47:22Z) - Distributed Reconstruction of Noisy Pooled Data [2.0559497209595823]
We consider two noise models for the pooled data problem.
We present and analyze for both error models a simple and efficient distributed algorithm.
Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability.
arXiv Detail & Related papers (2022-04-14T06:48:40Z) - Efficient and robust high-dimensional sparse logistic regression via
nonlinear primal-dual hybrid gradient algorithms [0.0]
We propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty.
This result improves on the known complexity bound of $O(min(m2n,mn2)log (1/epsilon))$ for first-order optimization methods.
arXiv Detail & Related papers (2021-11-30T14:16:48Z) - Model Selection with Near Optimal Rates for Reinforcement Learning with
General Model Classes [27.361399036211694]
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem.
In the model selection framework, instead of $mathcalP*$, we are given $M$ nested families of transition kernels.
We show that textttARL-GEN obtains a regret of $TildemathcalO(d_mathcalE*H2+sqrtd_mathcalE* mathbbM* H2 T)$
arXiv Detail & Related papers (2021-07-13T05:00:38Z) - Partial Wasserstein Covering [10.52782170493037]
We consider a general task called partial Wasserstein covering with the goal of emulating a large dataset.
We model this problem as a discrete optimization problem with partial Wasserstein divergence as an objective function.
We show that we can efficiently make two datasets similar in terms of partial Wasserstein divergence, including driving scene datasets.
arXiv Detail & Related papers (2021-06-02T01:48:41Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.