Fast Differentiable Clipping-Aware Normalization and Rescaling
- URL: http://arxiv.org/abs/2007.07677v1
- Date: Wed, 15 Jul 2020 13:43:22 GMT
- Title: Fast Differentiable Clipping-Aware Normalization and Rescaling
- Authors: Jonas Rauber, Matthias Bethge
- Abstract summary: We show that the optimal rescaling can be found analytically using a fast and differentiable algorithm.
Our algorithm works for any p-norm and can be used to train neural networks on inputs normalized to perturbations.
- Score: 22.320256458354137
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rescaling a vector $\vec{\delta} \in \mathbb{R}^n$ to a desired length is a
common operation in many areas such as data science and machine learning. When
the rescaled perturbation $\eta \vec{\delta}$ is added to a starting point
$\vec{x} \in D$ (where $D$ is the data domain, e.g. $D = [0, 1]^n$), the
resulting vector $\vec{v} = \vec{x} + \eta \vec{\delta}$ will in general not be
in $D$. To enforce that the perturbed vector $v$ is in $D$, the values of
$\vec{v}$ can be clipped to $D$. This subsequent element-wise clipping to the
data domain does however reduce the effective perturbation size and thus
interferes with the rescaling of $\vec{\delta}$. The optimal rescaling $\eta$
to obtain a perturbation with the desired norm after the clipping can be
iteratively approximated using a binary search. However, such an iterative
approach is slow and non-differentiable. Here we show that the optimal
rescaling can be found analytically using a fast and differentiable algorithm.
Our algorithm works for any p-norm and can be used to train neural networks on
inputs with normalized perturbations. We provide native implementations for
PyTorch, TensorFlow, JAX, and NumPy based on EagerPy.
Related papers
- Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly.
We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability.
Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - Tractability from overparametrization: The example of the negative
perceptron [9.077560551700163]
We analyze a linear programming algorithm to characterize the corresponding threshold threshold $delta_textlin(kappa)$.
We observe a gap between the threshold $delta_textlin(kappa)$, raising question of the behavior of other algorithms.
arXiv Detail & Related papers (2021-10-28T01:00:13Z) - Coresets for Decision Trees of Signals [19.537354146654845]
We provide the first algorithm that outputs such a $(k,varepsilon)$-coreset for emphevery such matrix $D$.
This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry.
arXiv Detail & Related papers (2021-10-07T05:49:55Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank
Approximation [23.06440095688755]
A common technique for compressing a neural network is to compute the $k$-rank $ell$ approximation $A_k,2$ of the matrix $AinmathbbRntimes d$ that corresponds to a fully connected layer (or embedding layer)
Here, $d$ is the number of the neurons in the layer, $n$ is the number in the next one, and $A_k,2$ can be stored in $O(n+d)k)$ memory instead of $O(nd)$.
This
arXiv Detail & Related papers (2020-09-11T20:21:42Z) - On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z) - The Average-Case Time Complexity of Certifying the Restricted Isometry
Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors.
We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.