Related papers: Consistent View Alignment Improves Foundation Models for 3D Medical Image Segmentation

Consistent View Alignment Improves Foundation Models for 3D Medical Image Segmentation

URL: http://arxiv.org/abs/2509.13846v1
Date: Wed, 17 Sep 2025 09:23:52 GMT
Title: Consistent View Alignment Improves Foundation Models for 3D Medical Image Segmentation
Authors: Puru Vaish, Felix Meister, Tobias Heimann, Christoph Brune, Jelmer M. Wolterink,
Abstract summary: We show that meaningful structure in the latent space does not emerge naturally.<n>We propose a method that aligns representations from different views of the data to align complementary information without inducing false positives.<n>Our experiments show that our proposed self-supervised learning method, Consistent View Alignment, improves performance for downstream tasks.
Score: 2.8281887612574153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many recent approaches in representation learning implicitly assume that uncorrelated views of a data point are sufficient to learn meaningful representations for various downstream tasks. In this work, we challenge this assumption and demonstrate that meaningful structure in the latent space does not emerge naturally. Instead, it must be explicitly induced. We propose a method that aligns representations from different views of the data to align complementary information without inducing false positives. Our experiments show that our proposed self-supervised learning method, Consistent View Alignment, improves performance for downstream tasks, highlighting the critical role of structured view alignment in learning effective representations. Our method achieved first and second place in the MICCAI 2025 SSL3D challenge when using a Primus vision transformer and ResEnc convolutional neural network, respectively. The code and pretrained model weights are released at https://github.com/Tenbatsu24/LatentCampus.

Related papers

Learning Without Augmenting: Unsupervised Time Series Representation Learning via Frame Projections [35.715609556178165]
Self-supervised learning has emerged as a powerful paradigm for learning representations without labeled data.<n>Most SSL approaches rely on strong, well-established, handcrafted data augmentations to generate diverse views for representation learning.<n>We propose an unsupervised representation learning method that replaces augmentations by generating views using orthonormal bases and overcomplete frames.
arXiv Detail & Related papers (2025-10-26T12:36:29Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z)
DILEMMA: Self-Supervised Shape and Texture Learning with Transformers [33.296154476701055]
We propose a pseudo-task to explicitly boost both shape and texture discriminability in models trained via self-supervised learning. We call our method DILEMMA, which stands for Detection of Incorrect Location EMbeddings with MAsked inputs.
arXiv Detail & Related papers (2022-04-10T22:58:02Z)
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. In this paper, we first study how biases in the dataset affect existing methods. We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z)
Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks. ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks. We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
Semantically-Guided Representation Learning for Self-Supervised Monocular Depth [40.49380547487908]
We propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning. Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
arXiv Detail & Related papers (2020-02-27T18:40:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.