Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction
- URL: http://arxiv.org/abs/2407.04242v1
- Date: Fri, 5 Jul 2024 04:09:30 GMT
- Title: Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction
- Authors: Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni,
- Abstract summary: We propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM)
Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatial-scale information by a multi-temporal SSM.
Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as temporal auxiliary information.
- Score: 8.558852563471525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependencies. In this context, we propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) to address the above challenge. Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception. Last, we design an online alignment strategy that encodes the temporal information as pseudo labels for multi-modal alignment to further improve reconstruction performance. Extensive experimental validations on two large-scale datasets show remarkable improvement from our method over competitors.
Related papers
- DARTs: A Dual-Path Robust Framework for Anomaly Detection in High-Dimensional Multivariate Time Series [22.29889788385778]
Multi-dimensional time series anomaly (MTSAD) aims to accurately identify and localize complex abnormal patterns in large-scale industrial control systems.<n>Existing approaches excel in recognizing distinct patterns under low representations, but fail to robustly capture long-range dependencies when learning from the high-dimensional time series.
arXiv Detail & Related papers (2025-12-14T07:40:23Z) - STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction [12.810918443757382]
Long-term-temporal time-series has developed rapidly, yet existing deep learning methods struggle with learning complex long-term-temporal dependencies efficiently.<n>In this paper, we propose an efficient textittextbfSTemporal textbfMultiscale textbfMamba (STM2) that includes a multiscale Mamba architecture and an adaptive graph causal convolution network to learn the complex multiscale-temporal dependency.
arXiv Detail & Related papers (2025-08-17T05:29:58Z) - Occupancy Learning with Spatiotemporal Memory [39.41175479685905]
We propose a scene-level occupancy representation learning framework that effectively learns 3D occupancy feature with temporal consistency.<n>Our method significantly enhances thetemporal representation learned for 3D occupancy prediction tasks by exploiting the temporal dependency between multi-frame inputs.
arXiv Detail & Related papers (2025-08-06T17:59:52Z) - SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.
We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.
With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z) - ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting [7.637123047745445]
Self-supervised methods are increasingly adapted to learn spatial-temporal representations.
Current value reconstruction and future value prediction are integrated into the pre-training framework.
Multi-time scale analysis is incorporated into the self-supervised loss to enhance predictive capability.
arXiv Detail & Related papers (2024-12-19T05:33:55Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework [4.773547922851949]
Traffic is a challenging-temporal forecasting problem that involves highly complex semantic correlations.
This paper proposes a Multi-level Multi-view Augmented-temporal Transformer (LVST) for traffic prediction.
arXiv Detail & Related papers (2024-06-17T07:36:57Z) - Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning [11.19088022423885]
We propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL.
Results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-06T08:24:06Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based
Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation.
We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.