Speech Audio Generation from dynamic MRI via a Knowledge Enhanced Conditional Variational Autoencoder
- URL: http://arxiv.org/abs/2503.06588v1
- Date: Sun, 09 Mar 2025 12:40:16 GMT
- Title: Speech Audio Generation from dynamic MRI via a Knowledge Enhanced Conditional Variational Autoencoder
- Authors: Yaxuan Li, Han Jiang, Yifei Ma, Shihua Qin, Fangxu Xing,
- Abstract summary: We propose a novel two-step "knowledge enhancement + variational inference" framework for generating speech audio signals from cine dynamic MRI sequences.<n>To the best of our knowledge, this is one of the first attempts at synthesizing speech audio directly from dynamic MRI video sequences.
- Score: 6.103954504752016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic Magnetic Resonance Imaging (MRI) of the vocal tract has become an increasingly adopted imaging modality for speech motor studies. Beyond image signals, systematic data loss, noise pollution, and audio file corruption can occur due to the unpredictability of the MRI acquisition environment. In such cases, generating audio from images is critical for data recovery in both clinical and research applications. However, this remains challenging due to hardware constraints, acoustic interference, and data corruption. Existing solutions, such as denoising and multi-stage synthesis methods, face limitations in audio fidelity and generalizability. To address these challenges, we propose a Knowledge Enhanced Conditional Variational Autoencoder (KE-CVAE), a novel two-step "knowledge enhancement + variational inference" framework for generating speech audio signals from cine dynamic MRI sequences. This approach introduces two key innovations: (1) integration of unlabeled MRI data for knowledge enhancement, and (2) a variational inference architecture to improve generative modeling capacity. To the best of our knowledge, this is one of the first attempts at synthesizing speech audio directly from dynamic MRI video sequences. The proposed method was trained and evaluated on an open-source dynamic vocal tract MRI dataset recorded during speech. Experimental results demonstrate its effectiveness in generating natural speech waveforms while addressing MRI-specific acoustic challenges, outperforming conventional deep learning-based synthesis approaches.
Related papers
- Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images [4.1738581761446145]
We introduce a novel approach leveraging a sparse mixture-of-experts framework for MRI image denoising.<n>Each expert is a specialized denoising convolutional neural network fine-tuned to target specific noise characteristics associated with different image regions.<n>Our method demonstrates superior performance over state-of-the-art denoising techniques on both synthetic and real-world brain MRI datasets.
arXiv Detail & Related papers (2025-01-24T03:04:44Z) - ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning [51.26601171361753]
We propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process.<n>We show that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance.
arXiv Detail & Related papers (2025-01-08T05:15:43Z) - Domain-Agnostic Stroke Lesion Segmentation Using Physics-Constrained Synthetic Data [0.15749416770494706]
We propose two novel approaches using synthetic quantitative MRI (qMRI) images to enhance the robustness and generalisability of segmentation models.<n>We trained a qMRI estimation model to predict qMRI maps from MPRAGE images, which were used to simulate diverse MRI sequences for segmentation training.<n>A second approach built upon prior work in synthetic data for stroke lesion segmentation, generating qMRI maps from a dataset of tissue labels.
arXiv Detail & Related papers (2024-12-04T13:52:05Z) - Ethics of Generating Synthetic MRI Vocal Tract Views from the Face [0.3755082744150184]
This paper explores the ethical implications of external-to-internal correlation modeling (E2ICM)
E2ICM uses facial movements to infer internal configurations and provides a cost-effective supporting technology for MRI.
We employ Pix2PixGAN to generate pseudo-MRI views from external articulatory data, demonstrating the feasibility of this approach.
arXiv Detail & Related papers (2024-07-11T11:12:48Z) - Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI [20.432212333539628]
We introduce a novel coarse-to-fine audio reconstruction method based on functional Magnetic Resonance Imaging (fMRI) data.
We validate our method on three public fMRI datasets-Brain2Sound, Brain2Music, and Brain2Speech.
By employing semantic prompts during decoding, we enhance the quality of reconstructed audio when semantic features are suboptimal.
arXiv Detail & Related papers (2024-05-29T03:16:14Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Volumetric Reconstruction Resolves Off-Resonance Artifacts in Static and
Dynamic PROPELLER MRI [76.60362295758596]
Off-resonance artifacts in magnetic resonance imaging (MRI) are visual distortions that occur when the actual resonant frequencies of spins within the imaging volume differ from the expected frequencies used to encode spatial information.
We propose to resolve these artifacts by lifting the 2D MRI reconstruction problem to 3D, introducing an additional "spectral" dimension to model this off-resonance.
arXiv Detail & Related papers (2023-11-22T05:44:51Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - Multi-Coil MRI Reconstruction Challenge -- Assessing Brain MRI
Reconstruction Models and their Generalizability to Varying Coil
Configurations [40.263770807921524]
Deep-learning-based brain magnetic resonance imaging (MRI) reconstruction methods have the potential to accelerate the MRI acquisition process.
The Multi-Coil Magnetic Resonance Image (MC-MRI) Reconstruction Challenge provides a benchmark that aims at addressing these issues.
We describe the challenge experimental design, and summarize the results of a set of baseline and state of the art brain MRI reconstruction models.
arXiv Detail & Related papers (2020-11-10T04:11:48Z) - Diffusion-Weighted Magnetic Resonance Brain Images Generation with
Generative Adversarial Networks and Variational Autoencoders: A Comparison
Study [55.78588835407174]
We show that high quality, diverse and realistic-looking diffusion-weighted magnetic resonance images can be synthesized using deep generative models.
We present two networks, the Introspective Variational Autoencoder and the Style-Based GAN, that qualify for data augmentation in the medical field.
arXiv Detail & Related papers (2020-06-24T18:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.