MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images
- URL: http://arxiv.org/abs/2407.07078v1
- Date: Tue, 9 Jul 2024 17:50:54 GMT
- Title: MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images
- Authors: Ziyang Xu, Huangxuan Zhao, Ziwei Cui, Wenyu Liu, Chuansheng Zheng, Xinggang Wang,
- Abstract summary: We propose MoSt-DSA, the first work that uses deep learning for Digital Subtraction Angiography frame.
Unlike natural scene Video Frame Interpolation (VFI) methods that extract unclear or coarse-grained features, we devise a general module that models motion and structural context interactions between frames in an efficient full convolution manner.
MoSt-DSA demonstrates robust results across 470 DSA image sequences, with average SSIM over 0.93, average PSNR over 38 (standard deviations of less than 0.030 and 3.6, respectively), comprehensively achieving state-of-the-art performance in accuracy, speed, visual effect, and
- Score: 31.357770667947907
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Artificial intelligence has become a crucial tool for medical image analysis. As an advanced cerebral angiography technique, Digital Subtraction Angiography (DSA) poses a challenge where the radiation dose to humans is proportional to the image count. By reducing images and using AI interpolation instead, the radiation can be cut significantly. However, DSA images present more complex motion and structural features than natural scenes, making interpolation more challenging. We propose MoSt-DSA, the first work that uses deep learning for DSA frame interpolation. Unlike natural scene Video Frame Interpolation (VFI) methods that extract unclear or coarse-grained features, we devise a general module that models motion and structural context interactions between frames in an efficient full convolution manner by adjusting optimal context range and transforming contexts into linear functions. Benefiting from this, MoSt-DSA is also the first method that directly achieves any number of interpolations at any time steps with just one forward pass during both training and testing. We conduct extensive comparisons with 7 representative VFI models for interpolating 1 to 3 frames, MoSt-DSA demonstrates robust results across 470 DSA image sequences (each typically 152 images), with average SSIM over 0.93, average PSNR over 38 (standard deviations of less than 0.030 and 3.6, respectively), comprehensively achieving state-of-the-art performance in accuracy, speed, visual effect, and memory usage. Our code is available at https://github.com/ZyoungXu/MoSt-DSA.
Related papers
- GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images [35.42266460525047]
Digital Subtraction Angiography (DSA) images contain complex vascular structures and various motions.
Applying natural scene Video Frame Interpolation (VFI) methods results in motion artifacts, structural dissipation, and blurriness.
MoSt-DSA has specifically addressed these issues for the first time and achieved SOTA results.
We propose GaraMoSt to address these issues within the same computational time scale.
arXiv Detail & Related papers (2024-12-18T18:04:12Z) - ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - MLMT-CNN for Object Detection and Segmentation in Multi-layer and Multi-spectral Images [4.2623421577291225]
We present a multi-task deep learning framework that exploits the dependencies between image bands to produce 3D AR localisation.
Our framework achieves an average of 0.72 IoU (segmentation) and 0.90 F1 score (detection) across all modalities.
arXiv Detail & Related papers (2024-07-19T17:21:53Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Plug-and-Play Regularization on Magnitude with Deep Priors for 3D Near-Field MIMO Imaging [0.0]
Near-field radar imaging systems are used in a wide range of applications such as concealed weapon detection and medical diagnosis.
We consider the problem of the three-dimensional (3D) complex-valued reflectivity by enforcing regularization on its magnitude.
arXiv Detail & Related papers (2023-12-26T12:25:09Z) - Video Frame Interpolation with Many-to-many Splatting and Spatial
Selective Refinement [83.60486465697318]
We propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently.
For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames.
We extend an M2M++ framework by introducing a flexible Spatial Selective Refinement component, which allows for trading computational efficiency for quality and vice versa.
arXiv Detail & Related papers (2023-10-29T09:09:32Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Decoupled Diffusion Models: Simultaneous Image to Zero and Zero to Noise [53.04220377034574]
We propose decoupled diffusion models (DDMs) for high-quality (un)conditioned image generation in less than 10 function evaluations.
We mathematically derive 1) the training objectives and 2) for the reverse time the sampling formula based on an analytic transition probability which models image to zero transition.
We experimentally yield very competitive performance compared with the state of the art in 1) unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256 and 2) image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image in
arXiv Detail & Related papers (2023-06-23T18:08:00Z) - Unfolding Framework with Prior of Convolution-Transformer Mixture and
Uncertainty Estimation for Video Snapshot Compressive Imaging [7.601695814245209]
We consider the problem of video snapshot compressive imaging (SCI), where sequential high-speed frames are modulated by different masks and captured by a single measurement.
By combining optimization algorithms and neural networks, deep unfolding networks (DUNs) score tremendous achievements in solving inverse problems.
arXiv Detail & Related papers (2023-06-20T06:25:48Z) - Multi-View Object Pose Refinement With Differentiable Renderer [22.040014384283378]
This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data.
It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame.
We report excellent performance in comparison to the state-of-the-art methods trained on the synthetic and real data.
arXiv Detail & Related papers (2022-07-06T17:02:22Z) - Look Back and Forth: Video Super-Resolution with Explicit Temporal
Difference Modeling [105.69197687940505]
We propose to explore the role of explicit temporal difference modeling in both LR and HR space.
To further enhance the super-resolution result, not only spatial residual features are extracted, but the difference between consecutive frames in high-frequency domain is also computed.
arXiv Detail & Related papers (2022-04-14T17:07:33Z) - Deformable Image Registration using Neural ODEs [15.245085400790002]
We present a generic, fast, and accurate diffeomorphic image registration framework that leverages neural ordinary differential equations (NODEs)
Compared with traditional optimization-based methods, our framework reduces the running time from tens of minutes to tens of seconds.
Our experiments show that the registration results of our method outperform state-of-the-arts under various metrics.
arXiv Detail & Related papers (2021-08-07T12:54:17Z) - Automatic size and pose homogenization with spatial transformer network
to improve and accelerate pediatric segmentation [51.916106055115755]
We propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN)
Our architecture is composed of three sequential modules that are estimated together during training.
We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners.
arXiv Detail & Related papers (2021-07-06T14:50:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.