Related papers: Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning

URL: http://arxiv.org/abs/2110.14805v1
Date: Wed, 27 Oct 2021 22:40:41 GMT
Title: Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning
Authors: Aakash Kaku, Sahana Upadhya, Narges Razavian
Abstract summary: We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing.
Score: 1.933681537640272
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard MoCo and learn informative features earlier in the network. Finally, by comparing the output probability distribution of models fine-tuned on small versus large labeled data, we conclude that our proposed method of pre-training leads to lower Kolmogorov-Smirnov distance, as compared to a standard MoCo. This provides additional evidence that our proposed method learns more informative features in the pre-training phase which could be leveraged in a low-labeled data regime.

Related papers

CoCoLIT: ControlNet-Conditioned Latent Image Translation for MRI to Amyloid PET Synthesis [2.333160549379721]
High dimensionality and structural complexity of 3D neuroimaging data pose challenges for MRI-to-PET translation.<n>We present CoCoLIT, a diffusion-based latent generative framework that incorporates three main innovations.<n>We evaluate CoCoLIT's performance on publicly available datasets and find that our model significantly outperforms state-of-the-art methods on both image-based and amyloid-related metrics.
arXiv Detail & Related papers (2025-08-02T09:58:30Z)
Gradient Attention Map Based Verification of Deep Convolutional Neural Networks with Application to X-ray Image Datasets [1.0208529247755187]
We propose a comprehensive verification framework that evaluates model suitability through multiple complementary strategies. First, we introduce a Gradient Attention Map (GAM)-based approach that analyzes attention patterns using Grad-CAM. Second, we extend verification to early convolutional feature maps, capturing structural mis-alignments missed by attention alone. Third, we incorporate an additional garbage class into the classification model to explicitly reject out-of-distribution inputs.
arXiv Detail & Related papers (2025-04-29T23:41:37Z)
Diffusion-based Hierarchical Negative Sampling for Multimodal Knowledge Graph Completion [6.24078177211832]
Multimodal Knowledge Graph Completion (MMKGC) aims to address the critical issue of missing knowledge in multimodal knowledge graphs. Previous approaches ignore the employment of multimodal information to generate diverse and high-quality negative triples. We propose a novel Diffusion-based Hierarchical Negative Sampling scheme tailored for MMKGC tasks.
arXiv Detail & Related papers (2025-01-26T04:20:34Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Cross-model Mutual Learning for Exemplar-based Medical Image Segmentation [25.874281336821685]
Cross-model Mutual learning framework for Exemplar-based Medical image (CMEMS) We introduce a novel Cross-model Mutual learning framework for Exemplar-based Medical image (CMEMS)
arXiv Detail & Related papers (2024-04-18T00:18:07Z)
Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z)
Multi-Scale Cross Contrastive Learning for Semi-Supervised Medical Image Segmentation [14.536384387956527]
We develop a novel Multi-Scale Cross Supervised Contrastive Learning framework to segment structures in medical images. Our approach contrasts multi-scale features based on ground-truth and cross-predicted labels, in order to extract robust feature representations. It outperforms state-of-the-art semi-supervised methods by more than 3.0% in Dice.
arXiv Detail & Related papers (2023-06-25T16:55:32Z)
Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation. We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks. We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z)
Successive Subspace Learning for Cardiac Disease Classification with Two-phase Deformation Fields from Cine MRI [36.044984400761535]
This work proposes a lightweight successive subspace learning framework for CVD classification. It is based on an interpretable feedforward design, in conjunction with a cardiac atlas. Compared with 3D CNN-based approaches, our framework achieves superior classification performance with 140$times$ fewer parameters.
arXiv Detail & Related papers (2023-01-21T15:00:59Z)
Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels [54.58539616385138]
We introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA) First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features.
arXiv Detail & Related papers (2022-09-27T15:50:31Z)
Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic Image Classification [61.656149405657246]
Domain adaptation is effective in image classification tasks where obtaining sufficient label data is challenging. We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods. The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-27T14:19:00Z)
Learning Multi-Modal Volumetric Prostate Registration with Weak Inter-Subject Spatial Correspondence [2.6894568533991543]
We introduce an auxiliary input to the neural network for the prior information about the prostate location in the MR sequence. With weakly labelled MR-TRUS prostate data, we showed registration quality comparable to the state-of-the-art deep learning-based method.
arXiv Detail & Related papers (2021-02-09T16:48:59Z)
G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers. We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.