Related papers: Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

URL: http://arxiv.org/abs/2407.04242v1
Date: Fri, 5 Jul 2024 04:09:30 GMT
Title: Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction
Authors: Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni,
Abstract summary: We propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatial-scale information by a multi-temporal SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as temporal auxiliary information.
Score: 8.558852563471525
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependencies. In this context, we propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) to address the above challenge. Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception. Last, we design an online alignment strategy that encodes the temporal information as pseudo labels for multi-modal alignment to further improve reconstruction performance. Extensive experimental validations on two large-scale datasets show remarkable improvement from our method over competitors.

Related papers

DARTs: A Dual-Path Robust Framework for Anomaly Detection in High-Dimensional Multivariate Time Series [22.29889788385778]
Multi-dimensional time series anomaly (MTSAD) aims to accurately identify and localize complex abnormal patterns in large-scale industrial control systems.<n>Existing approaches excel in recognizing distinct patterns under low representations, but fail to robustly capture long-range dependencies when learning from the high-dimensional time series.
arXiv Detail & Related papers (2025-12-14T07:40:23Z)
STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction [12.810918443757382]
Long-term-temporal time-series has developed rapidly, yet existing deep learning methods struggle with learning complex long-term-temporal dependencies efficiently.<n>In this paper, we propose an efficient textittextbfSTemporal textbfMultiscale textbfMamba (STM2) that includes a multiscale Mamba architecture and an adaptive graph causal convolution network to learn the complex multiscale-temporal dependency.
arXiv Detail & Related papers (2025-08-17T05:29:58Z)
Occupancy Learning with Spatiotemporal Memory [39.41175479685905]
We propose a scene-level occupancy representation learning framework that effectively learns 3D occupancy feature with temporal consistency.<n>Our method significantly enhances thetemporal representation learned for 3D occupancy prediction tasks by exploiting the temporal dependency between multi-frame inputs.
arXiv Detail & Related papers (2025-08-06T17:59:52Z)
SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs. We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting [7.637123047745445]
Self-supervised methods are increasingly adapted to learn spatial-temporal representations. Current value reconstruction and future value prediction are integrated into the pre-training framework. Multi-time scale analysis is incorporated into the self-supervised loss to enhance predictive capability.
arXiv Detail & Related papers (2024-12-19T05:33:55Z)
4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives. To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z)
Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture. Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model. Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z)
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion. Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z)
Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework [4.773547922851949]
Traffic is a challenging-temporal forecasting problem that involves highly complex semantic correlations. This paper proposes a Multi-level Multi-view Augmented-temporal Transformer (LVST) for traffic prediction.
arXiv Detail & Related papers (2024-06-17T07:36:57Z)
Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning [11.19088022423885]
We propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL. Results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-06T08:24:06Z)
Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications. Traditional methods rely on hand-crafted features and machine learning techniques. We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z)
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation. We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z)
Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions. The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.