Large Generative Model-assisted Talking-face Semantic Communication System
- URL: http://arxiv.org/abs/2411.03876v1
- Date: Wed, 06 Nov 2024 12:45:46 GMT
- Title: Large Generative Model-assisted Talking-face Semantic Communication System
- Authors: Feibo Jiang, Siwei Tu, Li Dong, Cunhua Pan, Jiangzhou Wang, Xiaohu You,
- Abstract summary: This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system.
Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density.
Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction.
Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
- Score: 55.42631520122753
- License:
- Abstract: The rapid development of generative Artificial Intelligence (AI) continually unveils the potential of Semantic Communication (SemCom). However, current talking-face SemCom systems still encounter challenges such as low bandwidth utilization, semantic ambiguity, and diminished Quality of Experience (QoE). This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) System tailored for the talking-face video communication. Firstly, we introduce a Generative Semantic Extractor (GSE) at the transmitter based on the FunASR model to convert semantically sparse talking-face videos into texts with high information density. Secondly, we establish a private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction, complemented by a joint knowledge base-semantic-channel coding scheme. Finally, at the receiver, we propose a Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video matching the user's timbre. Simulation results demonstrate the feasibility and effectiveness of the proposed LGM-TSC system.
Related papers
- Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models [5.867765921443141]
A Texture-Color based Semantic Communication system of Images TCSCI is proposed.
It decomposing the images into their natural language description (text), texture and color semantic features at the transmitter.
It can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process.
arXiv Detail & Related papers (2024-10-26T08:53:05Z) - Generative Semantic Communication for Text-to-Speech Synthesis [39.8799066368712]
This paper develops a novel generative semantic communication framework for text-to-speech synthesis.
We employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead.
arXiv Detail & Related papers (2024-10-04T14:18:31Z) - On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition [31.58289343561422]
We compare five different TTS decoder architectures in the scope of synthetic data generation to show the impact on CTC-based speech recognition training.
For data generation auto-regressive decoding performs better than non-autoregressive decoding, and propose an approach to quantify TTS generalization capabilities.
arXiv Detail & Related papers (2024-07-31T09:37:27Z) - Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning.
In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling.
The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z) - Language-Oriented Communication with Semantic Coding and Knowledge
Distillation for Text-to-Image Generation [53.97155730116369]
We put forward a novel framework of language-oriented semantic communication (LSC)
In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency.
We introduce three innovative algorithms: 1) semantic source coding (SSC), which compresses a text prompt into its key head words capturing the prompt's syntactic essence; 2) semantic channel coding ( SCC), that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD), that produces listener-customized prompts via in-context learning the listener's
arXiv Detail & Related papers (2023-09-20T08:19:05Z) - Generative AI-aided Joint Training-free Secure Semantic Communications
via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding.
In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z) - Causal Semantic Communication for Digital Twins: A Generalizable
Imitation Learning Approach [74.25870052841226]
A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing, and artificial intelligence (AI) technologies to enable many connected intelligence services.
Wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints.
A novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems.
arXiv Detail & Related papers (2023-04-25T00:15:00Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Seq2Seq-SC: End-to-End Semantic Communication Systems with Pre-trained
Language Model [20.925910474226885]
We propose a realistic semantic network called seq2seq-SC, designed to be compatible with 5G NR.
We employ a performance metric called semantic similarity, measured by BLEU for lexical similarity and SBERT for semantic similarity.
arXiv Detail & Related papers (2022-10-27T07:48:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.