Related papers: Teach me how to Interpolate a Myriad of Embeddings

Teach me how to Interpolate a Myriad of Embeddings

URL: http://arxiv.org/abs/2206.14868v1
Date: Wed, 29 Jun 2022 19:16:48 GMT
Title: Teach me how to Interpolate a Myriad of Embeddings
Authors: Shashanka Venkataramanan, Ewa Kijak, Laurent Amsaleg, Yannis Avrithis
Abstract summary: Mixup refers to data-based augmentation, originally motivated as a way to go beyond empirical risk minimization. We introduce MultiMix, which interpolates an arbitrary number $n$ of length $m$, with one vector $lambda$ per length. Our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks.
Score: 18.711509039868655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixup refers to interpolation-based data augmentation, originally motivated as a way to go beyond empirical risk minimization (ERM). Yet, its extensions focus on the definition of interpolation and the space where it takes place, while the augmentation itself is less studied: For a mini-batch of size $m$, most methods interpolate between $m$ pairs with a single scalar interpolation factor $\lambda$. In this work, we make progress in this direction by introducing MultiMix, which interpolates an arbitrary number $n$ of tuples, each of length $m$, with one vector $\lambda$ per tuple. On sequence data, we further extend to dense interpolation and loss computation over all spatial positions. Overall, we increase the number of tuples per mini-batch by orders of magnitude at little additional cost. This is possible by interpolating at the very last layer before the classifier. Finally, to address inconsistencies due to linear target interpolation, we introduce a self-distillation approach to generate and interpolate synthetic targets. We empirically show that our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks. By analyzing the embedding space, we observe that the classes are more tightly clustered and uniformly spread over the embedding space, thereby explaining the improved behavior.

Related papers

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization. We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses. We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Interplay between depth and width for interpolation in neural ODEs [0.0]
We examine the interplay between their width $p$ and number of layer transitions $L$. In the high-dimensional setting, we demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
arXiv Detail & Related papers (2024-01-18T11:32:50Z)
Embedding Space Interpolation Beyond Mini-Batch, Beyond Pairs and Beyond Examples [20.76232390972057]
We introduce MultiMix, which generates an arbitrarily large number of examples beyond the mini-batch size. We also densely interpolate features and target labels at each spatial location and also apply the loss densely. Our solutions yield significant improvement over state-of-the-art mixup methods on four different benchmarks, despite being only linear.
arXiv Detail & Related papers (2023-11-09T17:34:53Z)
A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions. We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$. We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z)
Learning in High Dimension Always Amounts to Extrapolation [22.220076291384686]
Extrapolation occurs when $x$ falls outside of a convex hull. Many intuitions and theories rely on that assumption. We argue against those two points and demonstrate that on any high-dimensional ($>$100) dataset, almost surely never happens.
arXiv Detail & Related papers (2021-10-18T17:32:25Z)
Joint Majorization-Minimization for Nonnegative Matrix Factorization with the $\beta$-divergence [4.468952886990851]
This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $beta$-divergence objective function. We report experimental results using diverse datasets: face images, an audio spectrogram, hyperspectral data and song play counts.
arXiv Detail & Related papers (2021-06-29T09:58:21Z)
The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods. We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z)
Escaping Saddle-Points Faster under Interpolation-like Conditions [19.9471360853892]
We show that under over-parametrization several standard optimization algorithms escape saddle-points and converge to local-minimizers much faster. We discuss the first-order oracle complexity of Perturbed Gradient Descent (PSGD) algorithm to reach an $epsilon$ localminimizer. We next analyze Cubic-Regularized Newton (SCRN) algorithm under-like conditions, and show that the oracle complexity to reach an $epsilon$ local-minimizer under-like conditions, is $tildemathcalO (1/epsilon2.5
arXiv Detail & Related papers (2020-09-28T02:15:18Z)
Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels. To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.