Embedding Space Interpolation Beyond Mini-Batch, Beyond Pairs and Beyond
Examples
- URL: http://arxiv.org/abs/2311.05538v1
- Date: Thu, 9 Nov 2023 17:34:53 GMT
- Title: Embedding Space Interpolation Beyond Mini-Batch, Beyond Pairs and Beyond
Examples
- Authors: Shashanka Venkataramanan, Ewa Kijak, Laurent Amsaleg, Yannis Avrithis
- Abstract summary: We introduce MultiMix, which generates an arbitrarily large number of examples beyond the mini-batch size.
We also densely interpolate features and target labels at each spatial location and also apply the loss densely.
Our solutions yield significant improvement over state-of-the-art mixup methods on four different benchmarks, despite being only linear.
- Score: 20.76232390972057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixup refers to interpolation-based data augmentation, originally motivated
as a way to go beyond empirical risk minimization (ERM). Its extensions mostly
focus on the definition of interpolation and the space (input or feature) where
it takes place, while the augmentation process itself is less studied. In most
methods, the number of generated examples is limited to the mini-batch size and
the number of examples being interpolated is limited to two (pairs), in the
input space.
We make progress in this direction by introducing MultiMix, which generates
an arbitrarily large number of interpolated examples beyond the mini-batch size
and interpolates the entire mini-batch in the embedding space. Effectively, we
sample on the entire convex hull of the mini-batch rather than along linear
segments between pairs of examples.
On sequence data, we further extend to Dense MultiMix. We densely interpolate
features and target labels at each spatial location and also apply the loss
densely. To mitigate the lack of dense labels, we inherit labels from examples
and weight interpolation factors by attention as a measure of confidence.
Overall, we increase the number of loss terms per mini-batch by orders of
magnitude at little additional cost. This is only possible because of
interpolating in the embedding space. We empirically show that our solutions
yield significant improvement over state-of-the-art mixup methods on four
different benchmarks, despite interpolation being only linear. By analyzing the
embedding space, we show that the classes are more tightly clustered and
uniformly spread over the embedding space, thereby explaining the improved
behavior.
Related papers
- Grad Queue : A probabilistic framework to reinforce sparse gradients [0.0]
We propose a robust mechanism to reinforce the sparse components within a random batch of data points.
A strong intuitive criterion to squeeze out redundant information from each cluster is the backbone of the system.
Our method has shown superior performance for CIFAR10, MNIST, and Reuters News category dataset compared to mini-batch descent.
arXiv Detail & Related papers (2024-04-25T16:07:01Z) - AID: Attention Interpolation of Text-to-Image Diffusion [64.87754163416241]
We introduce a training-free technique named Attention Interpolation via Diffusion (AID)
AID fuses the interpolated attention with self-attention to boost fidelity.
We also present a variant, Conditional-guided Attention Interpolation via Diffusion (AID), that considers as a condition-dependent generative process.
arXiv Detail & Related papers (2024-03-26T17:57:05Z) - Cooperative Minibatching in Graph Neural Networks [2.9904113489777826]
Training Graph Neural Networks (GNNs) requires significant computational resources, and the process is highly data-intensive.
One of the most effective ways to reduce resource requirements is minibatch training coupled with graph sampling.
We show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches.
arXiv Detail & Related papers (2023-10-19T01:15:24Z) - Federated Classification in Hyperbolic Spaces via Secure Aggregation of
Convex Hulls [35.327709607897944]
We develop distributed versions of convex SVM classifiers for Poincar'e discs.
We compute the complexity of the convex hulls in hyperbolic spaces to assess the extent of data leakage.
We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories.
arXiv Detail & Related papers (2023-08-14T02:25:48Z) - Growing Instance Mask on Leaf [12.312639923806548]
We present a single-shot method, called textbfVeinMask, for achieving competitive performance in low design complexity.
Considering the superiorities above, we propose VeinMask to formulate the instance segmentation problem.
VeinMask performs much better than other contour-based methods in low design complexity.
arXiv Detail & Related papers (2022-11-30T04:50:56Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Teach me how to Interpolate a Myriad of Embeddings [18.711509039868655]
Mixup refers to data-based augmentation, originally motivated as a way to go beyond empirical risk minimization.
We introduce MultiMix, which interpolates an arbitrary number $n$ of length $m$, with one vector $lambda$ per length.
Our contributions result in significant improvement over state-of-the-art mixup methods on four benchmarks.
arXiv Detail & Related papers (2022-06-29T19:16:48Z) - Federated Functional Gradient Boosting [75.06942944563572]
We study functional minimization in Federated Learning.
For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous.
arXiv Detail & Related papers (2021-03-11T21:49:19Z) - Suppressing Mislabeled Data via Grouping and Self-Attention [60.14212694011875]
Deep networks achieve excellent results on large-scale clean data but degrade significantly when learning from noisy labels.
This paper proposes a conceptually simple yet efficient training block, termed as Attentive Feature Mixup (AFM)
It allows paying more attention to clean samples and less to mislabeled ones via sample interactions in small groups.
arXiv Detail & Related papers (2020-10-29T13:54:16Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Interpolation and Learning with Scale Dependent Kernels [91.41836461193488]
We study the learning properties of nonparametric ridge-less least squares.
We consider the common case of estimators defined by scale dependent kernels.
arXiv Detail & Related papers (2020-06-17T16:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.