Related papers: US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound

US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound

URL: http://arxiv.org/abs/2602.19322v1
Date: Sun, 22 Feb 2026 19:56:56 GMT
Title: US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound
Authors: Ashwath Radhachandran, Vedrana Ivezić, Shreeram Athreya, Ronit Anilkumar, Corey W. Arnold, William Speier,
Abstract summary: Ultrasound (US) imaging poses unique challenges for representation learning due to its inherently noisy acquisition process.<n>Joint-Embedding Predictive Architectures (JEPAs) predict masked latent representations rather than raw pixels.<n>We propose US-JEPA, a self-supervised framework that adopts the Static-teacher Asymmetric Latent Training (SALT) objective.
Score: 5.216814358105614
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ultrasound (US) imaging poses unique challenges for representation learning due to its inherently noisy acquisition process. The low signal-to-noise ratio and stochastic speckle patterns hinder standard self-supervised learning methods relying on a pixel-level reconstruction objective. Joint-Embedding Predictive Architectures (JEPAs) address this drawback by predicting masked latent representations rather than raw pixels. However, standard approaches depend on hyperparameter-brittle and computationally expensive online teachers updated via exponential moving average. We propose US-JEPA, a self-supervised framework that adopts the Static-teacher Asymmetric Latent Training (SALT) objective. By using a frozen, domain-specific teacher to provide stable latent targets, US-JEPA decouples student-teacher optimization and pushes the student to expand upon the semantic priors of the teacher. In addition, we provide the first rigorous comparison of all publicly available state-of-the-art ultrasound foundation models on UltraBench, a public dataset benchmark spanning multiple organs and pathological conditions. Under linear probing for diverse classification tasks, US-JEPA achieves performance competitive with or superior to domain-specific and universal vision foundation model baselines. Our results demonstrate that masked latent prediction provides a stable and efficient path toward robust ultrasound representations.

Related papers

TIP: Resisting Gradient Inversion via Targeted Interpretable Perturbation in Federated Learning [8.156452885913108]
Federated Learning (FL) facilitates collaborative model training while preserving data locality.<n>The exchange of gradients renders the system vulnerable to Gradient Inversion Attacks (GIAs)<n>We propose Targeted Interpretable Perturbation (TIP), a novel defense framework that integrates model interpretability with frequency domain analysis.
arXiv Detail & Related papers (2026-02-12T06:32:49Z)
Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models [130.8912476550625]
We propose a positive-unlabeled (PU) reinforcement learning distillation method for on-premise small-model deployment.<n>Our method distills the teacher's preference-optimization capability from black-box generations into a locally trainable student.<n>Experiments demonstrate that our method achieves consistently strong performance under a low-cost setting.
arXiv Detail & Related papers (2026-01-28T15:14:50Z)
DSeq-JEPA: Discriminative Sequential Joint-Embedding Predictive Architecture [34.31498256147088]
DSeq-JEPA bridges predictive and autoregressive self-supervised learning, integrating JEPA-style latent prediction with GPT-style sequential reasoning.<n>DSeq-JEPA consistently focuses on more discriminative and generalizable representations than I-JEPA variants.
arXiv Detail & Related papers (2025-11-21T16:18:50Z)
Joint Embeddings Go Temporal [5.2741154046624255]
JointEmbedding Predictive Architectures (JEPA) has been introduced with the aim to perform self-supervised learning in the latent space.<n>Time Series JEPA (TS-JEPA) is an architecture specifically adapted for time series representation learning.<n>We show that TS-JEPA can match or surpass current state-of-the-art baselines on different standard datasets.
arXiv Detail & Related papers (2025-09-29T19:57:37Z)
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers [37.91964614027141]
Video Joint Embedding Predictive Architectures (V-JEPA) learn generalizable off-the-shelf video representation by predicting masked regions in latent space with an exponential moving average (EMA)-updated teacher.<n>We revisit masked-latent prediction and show that a frozen teacher suffices.<n>We find that student quality is remarkably robust to teacher quality: high-performing students emerge even with small, sub-optimal teachers.
arXiv Detail & Related papers (2025-09-29T05:55:17Z)
VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models [25.04763038570959]
Directional Gradient Projection (DiGraP) is a layer-wise trainable method that incorporates directional information from gradients to bridge regularization and multi-objective optimization.<n>We first bridge the uni-modal and multi-modal gap by performing analysis on Image Classification reformulated Visual Question Answering (VQA) benchmarks.<n> Experimental results show that DiGraP consistently outperforms existing baselines across Image Classfication and VQA tasks with discriminative and generative backbones.
arXiv Detail & Related papers (2025-02-21T19:31:55Z)
ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning [90.41852663775086]
ACT-JEPA is a novel architecture that integrates imitation learning and self-supervised learning.<n>We train a policy to predict action sequences and abstract observation sequences.<n>Our experiments show that ACT-JEPA improves the quality of representations by learning temporal environment dynamics.
arXiv Detail & Related papers (2025-01-24T16:41:41Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance. Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework. Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z)
Joint Embedding Predictive Architectures Focus on Slow Features [56.393060086442006]
Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative. We analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to rewards. We find that JEPA methods perform on par or better than reconstruction when distractor noise changes every time step, but fail when the noise is fixed.
arXiv Detail & Related papers (2022-11-20T00:50:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.