Related papers: Masked Autoencoders for Ultrasound Signals: Robust Representation Learning for Downstream Applications

Masked Autoencoders for Ultrasound Signals: Robust Representation Learning for Downstream Applications

URL: http://arxiv.org/abs/2508.20622v1
Date: Thu, 28 Aug 2025 10:13:33 GMT
Title: Masked Autoencoders for Ultrasound Signals: Robust Representation Learning for Downstream Applications
Authors: Immanuel Roßteutscher, Klaus S. Drese, Thorsten Uphues,
Abstract summary: We investigated the adaptation and performance of Masked Autoencoders (MAEs) with Vision Transformer (ViT) architectures for self-supervised representation learning on one-dimensional (1D) ultrasound signals.<n>Our results show that pre-trained models significantly outperform models trained from scratch and strong convolutional neural network (CNN) baselines optimized for the downstream task.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigated the adaptation and performance of Masked Autoencoders (MAEs) with Vision Transformer (ViT) architectures for self-supervised representation learning on one-dimensional (1D) ultrasound signals. Although MAEs have demonstrated significant success in computer vision and other domains, their use for 1D signal analysis, especially for raw ultrasound data, remains largely unexplored. Ultrasound signals are vital in industrial applications such as non-destructive testing (NDT) and structural health monitoring (SHM), where labeled data are often scarce and signal processing is highly task-specific. We propose an approach that leverages MAE to pre-train on unlabeled synthetic ultrasound signals, enabling the model to learn robust representations that enhance performance in downstream tasks, such as time-of-flight (ToF) classification. This study systematically investigated the impact of model size, patch size, and masking ratio on pre-training efficiency and downstream accuracy. Our results show that pre-trained models significantly outperform models trained from scratch and strong convolutional neural network (CNN) baselines optimized for the downstream task. Additionally, pre-training on synthetic data demonstrates superior transferability to real-world measured signals compared with training solely on limited real datasets. This study underscores the potential of MAEs for advancing ultrasound signal analysis through scalable, self-supervised learning.

Related papers

A Foundation Model for DAS Signal Recognition and Visual Prompt Tuning of the Pre-trained Model for Downstream Tasks [6.14430079610632]
This study proposes a foundational model for DAS signal recognition based on a Masked Autocoder, named MAEPD.<n>The model is pretrained on a dataset of 635860 samples, encompassing DAS gait signals, 2temporal GASF images for perimeter security, 2D time-frequency images for pipeline leakage, and open-dataset signals including whale vocalizations and seismic activities.<n>The VPT-Deep approach achieves a classification accuracy of 96.94% with just 0.322% of parameters fine-tuned, surpassing the traditional Full Fine Tuning (FFT) method by 0.61% and reducing training time by
arXiv Detail & Related papers (2025-08-06T11:02:25Z)
The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound [60.80780313225093]
This study systematically investigated the impact of data augmentation and preprocessing strategies in self-supervised learning for lung ultrasound.<n>Three data augmentation pipelines were assessed: a baseline pipeline commonly used across imaging domains, a novel semantic-preserving pipeline designed for ultrasound, and a distilled set of the most effective transformations from both pipelines.
arXiv Detail & Related papers (2025-04-10T16:26:47Z)
CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning [0.36832029288386137]
Transfer learning for bio-signals has recently become an important technique to improve prediction performance on downstream tasks with small bio-signal datasets.<n>We propose a new convolution-transformer hybrid model architecture with masked auto-encoding for low-data bio-signal transfer learning.<n>Our findings indicate that the convolution-only part of our hybrid model can achieve state-of-the-art performance on some low-data downstream tasks.
arXiv Detail & Related papers (2024-12-16T12:15:16Z)
An LSTM Feature Imitation Network for Hand Movement Recognition from sEMG Signals [2.632402517354116]
We propose utilizing a feature-imitating network (FIN) for closed-form temporal feature learning over a 300ms signal window on Ninapro DB2.<n>We observed that the LSTM-FIN network can achieve up to 99% R2 accuracy in feature reconstruction and 80% accuracy in hand movement recognition.
arXiv Detail & Related papers (2024-05-23T21:45:15Z)
Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction [14.53268880380804]
We propose a novel workflow where naive SMLD samples serve as additional priors to guide model-driven network training. First, we adopted a pretrained score network to generate samples as preliminary guidance images (PGI) Second, we designed a denoising module (DM) in the second step to coarsely eliminate artifacts and noises from PGIs. Third, we designed a model-driven network guided by denoised PGIs to further recover fine details.
arXiv Detail & Related papers (2024-05-05T14:56:34Z)
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism. We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z)
DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z)
Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN) CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z)
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z)
A Novel Approach For Analysis of Distributed Acoustic Sensing System Based on Deep Transfer Learning [0.0]
Convolutional neural networks are highly capable tools for extracting spatial information. Long-short term memory (LSTM) is an effective instrument for processing sequential data. VGG-16 architecture in our framework manages to obtain 100% classification accuracy in 50 trainings.
arXiv Detail & Related papers (2022-06-24T19:56:01Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity. We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.