Related papers: An Algorithm for Learning Smaller Representations of Models With Scarce Data

An Algorithm for Learning Smaller Representations of Models With Scarce Data

URL: http://arxiv.org/abs/2010.07990v1
Date: Thu, 15 Oct 2020 19:17:51 GMT
Title: An Algorithm for Learning Smaller Representations of Models With Scarce Data
Authors: Adrian de Wynter
Abstract summary: We present a greedy algorithm for solving binary classification problems in situations where the dataset is too small or not fully representative. It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a greedy algorithm for solving binary classification problems in situations where the dataset is either too small or not fully representative of the problem being solved, and obtaining more data is not possible. This algorithm is of particular interest when training small models that have trouble generalizing. It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data. Analysis on correctness and runtime complexity under ideal conditions and an extension to deep neural networks is provided. In the former case we obtain an asymptotic bound of $O\left(|\Theta^2|\left(\log{|\Theta|} + |\theta^2| + T_f\left(| D|\right)\right) + \bar{S}|\Theta||{E}|\right)$, where $|{\Theta}|$ is the cardinality of the set of hyperparameters $\theta$ to be searched; $|{E}|$ and $|{D}|$ are the sizes of the evaluation and training datasets, respectively; $\bar{S}$ and $\bar{f}$ are the inference times for the trained model and the candidate model; and $T_f({|{D}|})$ is a polynomial on $|{D}|$ and $\bar{f}$. Under these conditions, this algorithm returns a solution that is $1 \leq r \leq 2(1 - {2^{-|{\Theta}|}})$ times better than simply enumerating and training with any $\theta \in \Theta$. As part of our analysis of the generating function we also prove that, under certain assumptions, if an open cover of $D$ has the same homology as the manifold where the support of the underlying probability distribution lies, then $D$ is learnable, and viceversa.

Related papers

On Agnostic PAC Learning in the Small Error Regime [4.422219522591412]
Empirical Risk Minimization learners are suboptimal in the realizable case yet optimal in the agnostic case. Work of Hanneke, Larsen, and Zhivotovskiy addresses this shortcoming by including $tau$ as a parameter in the error term. We show that our learner achieves tightness of the error lower bound $tau + Omega left(sqrtfractau))m + fracd + log (1 / delta)m right)$ for a constant $c leq 2.1$
arXiv Detail & Related papers (2025-02-13T17:03:03Z)
Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector. We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z)
Model-agnostic basis functions for the 2-point correlation function of dark matter in linear theory [0.0]
We find a basis $mathcalB$ that describes $xi_rm lin(r)$ near the baryon acoustic oscillation feature in a wide class of cosmological models. Using our basis functions in model-agnostic BAO analyses can potentially lead to significant statistical gains.
arXiv Detail & Related papers (2024-10-28T18:00:01Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Distribution-Independent Regression for Generalized Linear Models with Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We present an algorithm that tackles newthis problem in its most general distribution-independent setting. This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z)
$\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits. For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model. We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z)
Robust Testing in High-Dimensional Sparse Models [0.0]
We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models. We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Omegaleft(min(slog d,1/gamma4)right) samples.
arXiv Detail & Related papers (2022-05-16T07:47:22Z)
Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes [27.361399036211694]
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem. In the model selection framework, instead of $mathcalP*$, we are given $M$ nested families of transition kernels. We show that textttARL-GEN obtains a regret of $TildemathcalO(d_mathcalE*H2+sqrtd_mathcalE* mathbbM* H2 T)$
arXiv Detail & Related papers (2021-07-13T05:00:38Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems. We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.