Related papers: An Embedding-Dynamic Approach to Self-supervised Learning

An Embedding-Dynamic Approach to Self-supervised Learning

URL: http://arxiv.org/abs/2207.03552v1
Date: Thu, 7 Jul 2022 19:56:20 GMT
Title: An Embedding-Dynamic Approach to Self-supervised Learning
Authors: Suhong Moon, Domas Buracas, Seunghyun Park, Jinkyu Kim, John Canny
Abstract summary: We treat the embeddings of images as point particles and consider model optimization as a dynamic process on this system of particles. Our dynamic model combines an attractive force for similar images, a locally dispersive force to avoid local collapse, and a global dispersive force to achieve a globally-homogeneous distribution of particles.
Score: 8.714677279673738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A number of recent self-supervised learning methods have shown impressive performance on image classification and other tasks. A somewhat bewildering variety of techniques have been used, not always with a clear understanding of the reasons for their benefits, especially when used in combination. Here we treat the embeddings of images as point particles and consider model optimization as a dynamic process on this system of particles. Our dynamic model combines an attractive force for similar images, a locally dispersive force to avoid local collapse, and a global dispersive force to achieve a globally-homogeneous distribution of particles. The dynamic perspective highlights the advantage of using a delayed-parameter image embedding (a la BYOL) together with multiple views of the same image. It also uses a purely-dynamic local dispersive force (Brownian motion) that shows improved performance over other methods and does not require knowledge of other particle coordinates. The method is called MSBReg which stands for (i) a Multiview centroid loss, which applies an attractive force to pull different image view embeddings toward their centroid, (ii) a Singular value loss, which pushes the particle system toward spatially homogeneous density, (iii) a Brownian diffusive loss. We evaluate downstream classification performance of MSBReg on ImageNet as well as transfer learning tasks including fine-grained classification, multi-class object classification, object detection, and instance segmentation. In addition, we also show that applying our regularization term to other methods further improves their performance and stabilize the training by preventing a mode collapse.

Related papers

Unsupervised Representation Learning by Balanced Self Attention Matching [2.3020018305241337]
We present a self-supervised method for embedding image features called BAM. We obtain rich representations and avoid feature collapse by minimizing a loss that matches these distributions to their globally balanced and entropy regularized version. We show competitive performance with leading methods on both semi-supervised and transfer-learning benchmarks.
arXiv Detail & Related papers (2024-08-04T12:52:44Z)
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models [55.99654128127689]
Visual Foundation Models (VFMs) are used to enhance 3D representation learning. VFMs generate semantic labels for weakly-supervised pixel-to-point contrastive distillation. We adapt sampling probabilities of points to address imbalances in spatial distribution and category frequency.
arXiv Detail & Related papers (2024-05-23T07:48:19Z)
FlowDepth: Decoupling Optical Flow for Self-Supervised Monocular Depth Estimation [8.78717459496649]
We propose FlowDepth, where a Dynamic Motion Flow Module (DMFM) decouples the optical flow by a mechanism-based approach and warps the dynamic regions thus solving the mismatch problem. For the unfairness of photometric errors caused by high-freq and low-texture regions, we use Depth-Cue-Aware Blur (DCABlur) and Cost-Volume sparsity loss respectively at the input and the loss level to solve the problem.
arXiv Detail & Related papers (2024-03-28T10:31:23Z)
Curricular Contrastive Regularization for Physics-aware Single Image Dehazing [56.392696439577165]
We propose a novel curricular contrastive regularization targeted at a consensual contrastive space as opposed to a non-consensual one. Our negatives, which provide better lower-bound constraints, can be assembled from 1) the hazy image, and 2) corresponding restorations by other existing methods. With the unit, as well as curricular contrastive regularization, we establish our dehazing network, named C2PNet.
arXiv Detail & Related papers (2023-03-24T18:18:25Z)
CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose [13.581694284209885]
Photometric differences are used to train neural networks for estimating depth and camera pose from unlabeled monocular videos. In this paper, we deal with moving objects and occlusions utilizing the difference of the flow fields and depth structure generated by affine transformation and view synthesis. We mitigate the effect of textureless regions on model optimization by measuring differences between features with more semantic and contextual information without adding networks.
arXiv Detail & Related papers (2022-12-12T12:18:24Z)
Unsupervised Feature Clustering Improves Contrastive Representation Learning for Medical Image Segmentation [18.75543045234889]
Self-supervised instance discrimination is an effective contrastive pretext task to learn feature representations and address limited medical image annotations. We propose a new self-supervised contrastive learning method that uses unsupervised feature clustering to better select positive and negative image samples. Our method outperforms state-of-the-art self-supervised contrastive techniques on these tasks.
arXiv Detail & Related papers (2022-11-15T22:54:29Z)
A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples. We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples. Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)
Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations [79.71184760864507]
This article introduces a new physics-based method for rigid point set alignment called Fast Gravitational Approach (FGA) In FGA, the source and target point sets are interpreted as rigid particle swarms with masses interacting in a globally multiply-linked manner while moving in a simulated gravitational force field. We show that the new method class has characteristics not found in previous alignment methods.
arXiv Detail & Related papers (2020-09-28T15:05:39Z)
Deep Variational Network Toward Blind Image Restoration [60.45350399661175]
Blind image restoration is a common yet challenging problem in computer vision. We propose a novel blind image restoration method, aiming to integrate both the advantages of them. Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.
arXiv Detail & Related papers (2020-08-25T03:30:53Z)
Blur-Attention: A boosting mechanism for non-uniform blurred image restoration [27.075713246257596]
We propose a blur-attention module to dynamically capture the spatially varying features of non-uniform blurred images. By introducing the blur-attention network into a conditional generation adversarial framework, we propose an end-to-end blind motion deblurring method. Experimental results show that the deblurring capability of our method achieved outstanding objective performance in terms of PSNR, SSIM, and subjective visual quality.
arXiv Detail & Related papers (2020-08-19T16:07:06Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.