Hyperspherically Regularized Networks for BYOL Improves Feature
Uniformity and Separability
- URL: http://arxiv.org/abs/2105.00925v1
- Date: Thu, 29 Apr 2021 18:57:27 GMT
- Title: Hyperspherically Regularized Networks for BYOL Improves Feature
Uniformity and Separability
- Authors: Aiden Durrant and Georgios Leontidis
- Abstract summary: bootstrap Your Own Latent (BYOL) introduced an approach to self-supervised learning avoiding the contrastive paradigm.
This work empirically demonstrates that feature diversity enforced by contrastive losses is beneficial when employed in BYOL.
- Score: 4.822598110892847
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bootstrap Your Own Latent (BYOL) introduced an approach to self-supervised
learning avoiding the contrastive paradigm and subsequently removing the
computational burden of negative sampling. However, feature representations
under this paradigm are poorly distributed on the surface of the
unit-hypersphere representation space compared to contrastive methods. This
work empirically demonstrates that feature diversity enforced by contrastive
losses is beneficial when employed in BYOL, and as such, provides greater
inter-class feature separability. Therefore to achieve a more uniform
distribution of features, we advocate the minimization of hyperspherical energy
(i.e. maximization of entropy) in BYOL network weights. We show that directly
optimizing a measure of uniformity alongside the standard loss, or regularizing
the networks of the BYOL architecture to minimize the hyperspherical energy of
neurons can produce more uniformly distributed and better performing
representations for downstream tasks.
Related papers
- Generalizable Person Re-identification via Balancing Alignment and Uniformity [22.328800139066914]
Domain generalizable person re-identification (DG re-ID) aims to learn discriminative representations that are robust to distributional shifts.
Certain augmentations exhibit a polarized effect in this task, enhancing in-distribution performance while deteriorating out-of-distribution performance.
We propose a novel framework, Balancing Alignment and Uniformity (BAU), which effectively mitigates this effect by maintaining a balance between alignment and uniformity.
arXiv Detail & Related papers (2024-11-18T11:13:30Z) - Negative-Free Self-Supervised Gaussian Embedding of Graphs [29.26519601854811]
Graph Contrastive Learning (GCL) has emerged as a promising graph self-supervised learning framework.
We propose a negative-free objective to achieve uniformity, inspired by the fact that points distributed according to a normalized isotropic Gaussian are uniformly spread across the unit hypersphere.
Our proposal achieves competitive performance with fewer parameters, shorter training times, and lower memory consumption compared to existing GCL methods.
arXiv Detail & Related papers (2024-11-02T07:04:40Z) - Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins.
While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images.
We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z) - Hard Negative Sampling via Regularized Optimal Transport for Contrastive
Representation Learning [13.474603286270836]
We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning.
We propose and analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss.
arXiv Detail & Related papers (2021-11-04T21:25:24Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent
Space Distribution Matching in WAE [51.09507030387935]
Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.
We propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem.
We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE.
arXiv Detail & Related papers (2021-10-19T22:55:47Z) - Eccentric Regularization: Minimizing Hyperspherical Energy without
explicit projection [0.913755431537592]
We introduce a novel regularizing loss function which simulates a pairwise repulsive force between items.
We show that minimizing this loss function in isolation achieves a hyperspherical distribution.
We apply this method of Eccentric Regularization to an autoencoder, and demonstrate its effectiveness in image generation, representation learning and downstream classification tasks.
arXiv Detail & Related papers (2021-04-23T13:55:17Z) - Generalization Properties of Optimal Transport GANs with Latent
Distribution Learning [52.25145141639159]
We study how the interplay between the latent distribution and the complexity of the pushforward map affects performance.
Motivated by our analysis, we advocate learning the latent distribution as well as the pushforward map within the GAN paradigm.
arXiv Detail & Related papers (2020-07-29T07:31:33Z) - Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.