TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots
- URL: http://arxiv.org/abs/2504.20362v1
- Date: Tue, 29 Apr 2025 02:00:08 GMT
- Title: TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots
- Authors: Qinhua Xie, Hao Tang,
- Abstract summary: TTTFusion is a Test-Time Training-based image fusion strategy.<n>It dynamically adjusts model parameters during inference to efficiently fuse multimodal medical images.<n>It significantly enhances the fusion quality, particularly in fine-grained feature extraction and edge preservation.
- Score: 7.250878248686215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing use of surgical robots in clinical practice, enhancing their ability to process multimodal medical images has become a key research challenge. Although traditional medical image fusion methods have made progress in improving fusion accuracy, they still face significant challenges in real-time performance, fine-grained feature extraction, and edge preservation.In this paper, we introduce TTTFusion, a Test-Time Training (TTT)-based image fusion strategy that dynamically adjusts model parameters during inference to efficiently fuse multimodal medical images. By adapting the model during the test phase, our method optimizes the parameters based on the input image data, leading to improved accuracy and better detail preservation in the fusion results.Experimental results demonstrate that TTTFusion significantly enhances the fusion quality of multimodal images compared to traditional fusion methods, particularly in fine-grained feature extraction and edge preservation. This approach not only improves image fusion accuracy but also offers a novel technical solution for real-time image processing in surgical robots.
Related papers
- Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion [13.029564509505676]
Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation.
While deep learning methods have significantly advanced fusion performance, some of the existing CNN-based methods fall short in capturing fine-grained multiscale and edge features.
We propose a novel CNN-based architecture that addresses these limitations by introducing a Dilated Residual Attention Network Module for effective multiscale feature extraction.
arXiv Detail & Related papers (2024-11-18T18:11:53Z) - Random Token Fusion for Multi-View Medical Diagnosis [2.3458652461211935]
In multi-view medical datasets, deep learning models often fuse information from different imaging perspectives to improve diagnosis performance.
Existing approaches are prone to overfitting and rely heavily on view-specific features, which can lead to trivial solutions.
In this work, we introduce a novel technique designed to enhance image analysis using multi-view medical transformers.
arXiv Detail & Related papers (2024-10-21T10:19:45Z) - Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model [2.507050016527729]
Tri-modal medical image fusion can provide a more comprehensive view of the disease's shape, location, and biological activity.
Due to the limitations of imaging equipment and considerations for patient safety, the quality of medical images is usually limited.
There is an urgent need for a technology that can both enhance image resolution and integrate multi-modal information.
arXiv Detail & Related papers (2024-04-26T12:13:41Z) - A New Multimodal Medical Image Fusion based on Laplacian Autoencoder
with Channel Attention [3.1531360678320897]
Deep learning models have achieved end-to-end image fusion with highly robust and accurate performance.
Most DL-based fusion models perform down-sampling on the input images to minimize the number of learnable parameters and computations.
We propose a new multimodal medical image fusion model is proposed that is based on integrated Laplacian-Gaussian concatenation with attention pooling.
arXiv Detail & Related papers (2023-10-18T11:29:53Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - Bridging Synthetic and Real Images: a Transferable and Multiple
Consistency aided Fundus Image Enhancement Framework [61.74188977009786]
We propose an end-to-end optimized teacher-student framework to simultaneously conduct image enhancement and domain adaptation.
We also propose a novel multi-stage multi-attention guided enhancement network (MAGE-Net) as the backbones of our teacher and student network.
arXiv Detail & Related papers (2023-02-23T06:16:15Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Solving Inverse Problems in Medical Imaging with Score-Based Generative
Models [87.48867245544106]
Reconstructing medical images from partial measurements is an important inverse problem in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI)
Existing solutions based on machine learning typically train a model to directly map measurements to medical images.
We propose a fully unsupervised technique for inverse problem solving, leveraging the recently introduced score-based generative models.
arXiv Detail & Related papers (2021-11-15T05:41:12Z) - Coupled Feature Learning for Multimodal Medical Image Fusion [42.23662451234756]
Multimodal image fusion aims to combine relevant information from images acquired with different sensors.
In this paper, we propose a novel multimodal image fusion method based on coupled dictionary learning.
arXiv Detail & Related papers (2021-02-17T09:13:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.