Related papers: DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision

DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision

URL: http://arxiv.org/abs/2509.18765v1
Date: Tue, 23 Sep 2025 07:58:21 GMT
Title: DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision
Authors: Azad Singh, Deepak Mishra,
Abstract summary: DiSSECT is a framework that integrates multi-scale vector quantization into the SSL pipeline to impose a discrete representational bottleneck.<n>It achieves strong performance on both classification and segmentation tasks, requiring minimal or no fine-tuning.<n>We validate DiSSECT across multiple public medical imaging datasets, demonstrating its robustness and generalizability.
Score: 9.254163621425727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning (SSL) has emerged as a powerful paradigm for medical image representation learning, particularly in settings with limited labeled data. However, existing SSL methods often rely on complex architectures, anatomy-specific priors, or heavily tuned augmentations, which limit their scalability and generalizability. More critically, these models are prone to shortcut learning, especially in modalities like chest X-rays, where anatomical similarity is high and pathology is subtle. In this work, we introduce DiSSECT -- Discrete Self-Supervision for Efficient Clinical Transferable Representations, a framework that integrates multi-scale vector quantization into the SSL pipeline to impose a discrete representational bottleneck. This constrains the model to learn repeatable, structure-aware features while suppressing view-specific or low-utility patterns, improving representation transfer across tasks and domains. DiSSECT achieves strong performance on both classification and segmentation tasks, requiring minimal or no fine-tuning, and shows particularly high label efficiency in low-label regimes. We validate DiSSECT across multiple public medical imaging datasets, demonstrating its robustness and generalizability compared to existing state-of-the-art approaches.

Related papers

SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation [18.723160085156717]
We propose SegMoTE, an efficient and adaptive framework for medical image segmentation.<n>SegMoTE preserves SAM's original prompt interface, efficient inference, and zero-shot generalization.<n>SegMoTE achieves SOTA performance across diverse imaging modalities and anatomical tasks.
arXiv Detail & Related papers (2026-02-22T14:48:42Z)
Vision Foundry: A System for Training Foundational Vision AI Models [0.0]
Vision Foundry is a code-free, HIPAA-compliant platform that democratizes pre-training, adaptation, and deployment of vision models.<n>By bridging the gap between advanced representation learning and practical application, Vision Foundry enables domain experts to develop state-of-the-art clinical AI tools.
arXiv Detail & Related papers (2025-12-03T14:02:22Z)
MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNet is a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning.<n>We introduce TongueAtlas-4K, a benchmark comprising 4,000 images annotated with 22 diagnostic labels.
arXiv Detail & Related papers (2025-11-13T06:30:41Z)
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z)
MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition [49.91188543847175]
Multi-Skeleton Contrastive Learning (MS-CLR) is a framework that aligns pose representations across multiple skeleton conventions extracted from the same sequence.<n>MS-CLR consistently improves performance over strong single-skeleton contrastive learning baselines.<n>A multi-skeleton ensemble further boosts performance, setting new state-of-the-art results on both datasets.
arXiv Detail & Related papers (2025-08-20T17:58:03Z)
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
HDC: Hierarchical Distillation for Multi-level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation [2.964206587462833]
A novel semi-supervised segmentation framework, called HDC, is proposed incorporating adaptive consistency learning with a single-teacher architecture.<n>The framework introduces a hierarchical distillation mechanism with two objectives: Correlation Guidance Loss for aligning feature representations and Mutual Information Loss for stabilizing noisy student learning.
arXiv Detail & Related papers (2025-04-14T04:52:24Z)
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images [7.048241543461529]
We propose a novel framework called Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE) to address these challenges in zero-shot histopathology image classification.<n>We introduce a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings.<n>A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings.
arXiv Detail & Related papers (2025-03-13T12:18:37Z)
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation [30.524999223901645]
We propose an enhanced Segment Anything Model (SAM) framework that utilizes annotation-efficient prompts generated in a fully unsupervised fashion.<n>We adopt the direct preference optimization technique to design an optimal policy that enables the model to generate high-fidelity segmentations.<n>State-of-the-art performance of our framework in tasks such as lung segmentation, breast tumor segmentation, and organ segmentation across various modalities, including X-ray, ultrasound, and abdominal CT, justifies its effectiveness in low-annotation data scenarios.
arXiv Detail & Related papers (2025-03-06T17:28:48Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
OTCXR: Rethinking Self-supervised Alignment using Optimal Transport for Chest X-ray Analysis [6.4136876268620115]
Self-supervised learning (SSL) has emerged as a promising technique for analyzing medical modalities such as X-rays.<n>We propose OTCXR, a novel SSL framework that leverages optimal transport (OT) to learn dense semantic invariance.<n>We validate OTCXR's efficacy through comprehensive experiments on three publicly available chest X-ray datasets.
arXiv Detail & Related papers (2024-04-18T02:59:48Z)
Learning Multiscale Consistency for Self-supervised Electron Microscopy Instance Segmentation [48.267001230607306]
We propose a pretraining framework that enhances multiscale consistency in EM volumes. Our approach leverages a Siamese network architecture, integrating strong and weak data augmentations. It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
arXiv Detail & Related papers (2023-08-19T05:49:13Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.