An Algorithm for Learning Smaller Representations of Models With Scarce
Data
- URL: http://arxiv.org/abs/2010.07990v1
- Date: Thu, 15 Oct 2020 19:17:51 GMT
- Title: An Algorithm for Learning Smaller Representations of Models With Scarce
Data
- Authors: Adrian de Wynter
- Abstract summary: We present a greedy algorithm for solving binary classification problems in situations where the dataset is too small or not fully representative.
It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a greedy algorithm for solving binary classification problems in
situations where the dataset is either too small or not fully representative of
the problem being solved, and obtaining more data is not possible. This
algorithm is of particular interest when training small models that have
trouble generalizing. It relies on a trained model with loose accuracy
constraints, an iterative hyperparameter pruning procedure, and a function used
to generate new data. Analysis on correctness and runtime complexity under
ideal conditions and an extension to deep neural networks is provided. In the
former case we obtain an asymptotic bound of
$O\left(|\Theta^2|\left(\log{|\Theta|} + |\theta^2| + T_f\left(|
D|\right)\right) + \bar{S}|\Theta||{E}|\right)$, where $|{\Theta}|$ is the
cardinality of the set of hyperparameters $\theta$ to be searched; $|{E}|$ and
$|{D}|$ are the sizes of the evaluation and training datasets, respectively;
$\bar{S}$ and $\bar{f}$ are the inference times for the trained model and the
candidate model; and $T_f({|{D}|})$ is a polynomial on $|{D}|$ and $\bar{f}$.
Under these conditions, this algorithm returns a solution that is $1 \leq r
\leq 2(1 - {2^{-|{\Theta}|}})$ times better than simply enumerating and
training with any $\theta \in \Theta$. As part of our analysis of the
generating function we also prove that, under certain assumptions, if an open
cover of $D$ has the same homology as the manifold where the support of the
underlying probability distribution lies, then $D$ is learnable, and viceversa.
Related papers
- Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory [0.0]
We find a basis $mathcalB$ that describes $xi_rm lin(r)$ near the baryon acoustic oscillation feature in a wide class of cosmological models.
Using our basis functions in model-agnostic BAO analyses can potentially lead to significant statistical gains.
arXiv Detail & Related papers (2024-10-28T18:00:01Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture
Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model.
We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z) - Robust Testing in High-Dimensional Sparse Models [0.0]
We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models.
We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Omegaleft(min(slog d,1/gamma4)right) samples.
arXiv Detail & Related papers (2022-05-16T07:47:22Z) - Model Selection with Near Optimal Rates for Reinforcement Learning with
General Model Classes [27.361399036211694]
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem.
In the model selection framework, instead of $mathcalP*$, we are given $M$ nested families of transition kernels.
We show that textttARL-GEN obtains a regret of $TildemathcalO(d_mathcalE*H2+sqrtd_mathcalE* mathbbM* H2 T)$
arXiv Detail & Related papers (2021-07-13T05:00:38Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.