Teach me how to Interpolate a Myriad of Embeddings
- URL: http://arxiv.org/abs/2206.14868v1
- Date: Wed, 29 Jun 2022 19:16:48 GMT
- Title: Teach me how to Interpolate a Myriad of Embeddings
- Authors: Shashanka Venkataramanan, Ewa Kijak, Laurent Amsaleg, Yannis Avrithis
- Abstract summary: Mixup refers to data-based augmentation, originally motivated as a way to go beyond empirical risk minimization.
We introduce MultiMix, which interpolates an arbitrary number $n$ of length $m$, with one vector $lambda$ per length.
Our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks.
- Score: 18.711509039868655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixup refers to interpolation-based data augmentation, originally motivated
as a way to go beyond empirical risk minimization (ERM). Yet, its extensions
focus on the definition of interpolation and the space where it takes place,
while the augmentation itself is less studied: For a mini-batch of size $m$,
most methods interpolate between $m$ pairs with a single scalar interpolation
factor $\lambda$.
In this work, we make progress in this direction by introducing MultiMix,
which interpolates an arbitrary number $n$ of tuples, each of length $m$, with
one vector $\lambda$ per tuple. On sequence data, we further extend to dense
interpolation and loss computation over all spatial positions. Overall, we
increase the number of tuples per mini-batch by orders of magnitude at little
additional cost. This is possible by interpolating at the very last layer
before the classifier. Finally, to address inconsistencies due to linear target
interpolation, we introduce a self-distillation approach to generate and
interpolate synthetic targets.
We empirically show that our contributions result in significant improvement
over state-of-the-art mixup methods on four benchmarks. By analyzing the
embedding space, we observe that the classes are more tightly clustered and
uniformly spread over the embedding space, thereby explaining the improved
behavior.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Interplay between depth and width for interpolation in neural ODEs [0.0]
We examine the interplay between their width $p$ and number of layer transitions $L$.
In the high-dimensional setting, we demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
arXiv Detail & Related papers (2024-01-18T11:32:50Z) - Embedding Space Interpolation Beyond Mini-Batch, Beyond Pairs and Beyond
Examples [20.76232390972057]
We introduce MultiMix, which generates an arbitrarily large number of examples beyond the mini-batch size.
We also densely interpolate features and target labels at each spatial location and also apply the loss densely.
Our solutions yield significant improvement over state-of-the-art mixup methods on four different benchmarks, despite being only linear.
arXiv Detail & Related papers (2023-11-09T17:34:53Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Learning in High Dimension Always Amounts to Extrapolation [22.220076291384686]
Extrapolation occurs when $x$ falls outside of a convex hull.
Many intuitions and theories rely on that assumption.
We argue against those two points and demonstrate that on any high-dimensional ($>$100) dataset, almost surely never happens.
arXiv Detail & Related papers (2021-10-18T17:32:25Z) - Joint Majorization-Minimization for Nonnegative Matrix Factorization
with the $\beta$-divergence [4.468952886990851]
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $beta$-divergence objective function.
We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts.
arXiv Detail & Related papers (2021-06-29T09:58:21Z) - The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods.
We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z) - Escaping Saddle-Points Faster under Interpolation-like Conditions [19.9471360853892]
We show that under over-parametrization several standard optimization algorithms escape saddle-points and converge to local-minimizers much faster.
We discuss the first-order oracle complexity of Perturbed Gradient Descent (PSGD) algorithm to reach an $epsilon$ localminimizer.
We next analyze Cubic-Regularized Newton (SCRN) algorithm under-like conditions, and show that the oracle complexity to reach an $epsilon$ local-minimizer under-like conditions, is $tildemathcalO (1/epsilon2.5
arXiv Detail & Related papers (2020-09-28T02:15:18Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.