Related papers: Large Generative Model-assisted Talking-face Semantic Communication System

Large Generative Model-assisted Talking-face Semantic Communication System

URL: http://arxiv.org/abs/2411.03876v1
Date: Wed, 06 Nov 2024 12:45:46 GMT
Title: Large Generative Model-assisted Talking-face Semantic Communication System
Authors: Feibo Jiang, Siwei Tu, Li Dong, Cunhua Pan, Jiangzhou Wang, Xiaohu You,
Abstract summary: This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system. Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density. Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction. Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
Score: 55.42631520122753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid development of generative Artificial Intelligence (AI) continually unveils the potential of Semantic Communication (SemCom). However, current talking-face SemCom systems still encounter challenges such as low bandwidth utilization, semantic ambiguity, and diminished Quality of Experience (QoE). This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) System tailored for the talking-face video communication. Firstly, we introduce a Generative Semantic Extractor (GSE) at the transmitter based on the FunASR model to convert semantically sparse talking-face videos into texts with high information density. Secondly, we establish a private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction, complemented by a joint knowledge base-semantic-channel coding scheme. Finally, at the receiver, we propose a Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video matching the user's timbre. Simulation results demonstrate the feasibility and effectiveness of the proposed LGM-TSC system.

Related papers

Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model [55.71885688565501]
We propose a scalable generative video semantic communication framework that extracts and transmits semantic information to achieve high-quality video reconstruction. Specifically, at the transmitter, description and other condition signals are extracted from the source video, functioning as text and structural semantics, respectively. At the receiver, the diffusion-based GenAI large models are utilized to fuse the semantics of the multiple modalities for reconstructing the video.
arXiv Detail & Related papers (2025-02-19T15:59:07Z)
Generative Semantic Communication: Architectures, Technologies, and Applications [36.67865904029129]
This paper delves into the applications of generative artificial intelligence (GAI) in semantic communication (SemCom) Three popular SemCom systems are first introduced, including variational autoencoders, generative adversarial networks, and diffusion models. A novel generative SemCom system is proposed by incorporating the cutting-edge GAI technology-large language models (LLMs)
arXiv Detail & Related papers (2024-12-11T18:59:50Z)
Semantic Feature Decomposition based Semantic Communication System of Images with Large-scale Visual Generation Models [5.867765921443141]
A Texture-Color based Semantic Communication system of Images TCSCI is proposed. It decomposing the images into their natural language description (text), texture and color semantic features at the transmitter. It can achieve extremely compressed, highly noise-resistant, and visually similar image semantic communication, while ensuring the interpretability and editability of the transmission process.
arXiv Detail & Related papers (2024-10-26T08:53:05Z)
Generative Semantic Communication for Text-to-Speech Synthesis [39.8799066368712]
This paper develops a novel generative semantic communication framework for text-to-speech synthesis. We employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead.
arXiv Detail & Related papers (2024-10-04T14:18:31Z)
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition [31.58289343561422]
We compare five different TTS decoder architectures in the scope of synthetic data generation to show the impact on CTC-based speech recognition training. For data generation auto-regressive decoding performs better than non-autoregressive decoding, and propose an approach to quantify TTS generalization capabilities.
arXiv Detail & Related papers (2024-07-31T09:37:27Z)
Agent-driven Generative Semantic Communication with Cross-Modality and Prediction [57.335922373309074]
We propose a novel agent-driven generative semantic communication framework based on reinforcement learning. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. The effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework.
arXiv Detail & Related papers (2024-04-10T13:24:27Z)
Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation [53.97155730116369]
We put forward a novel framework of language-oriented semantic communication (LSC) In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency. We introduce three innovative algorithms: 1) semantic source coding (SSC), which compresses a text prompt into its key head words capturing the prompt's syntactic essence; 2) semantic channel coding ( SCC), that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD), that produces listener-customized prompts via in-context learning the listener's
arXiv Detail & Related papers (2023-09-20T08:19:05Z)
Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z)
Causal Semantic Communication for Digital Twins: A Generalizable Imitation Learning Approach [74.25870052841226]
A digital twin (DT) leverages a virtual representation of the physical world, along with communication (e.g., 6G), computing, and artificial intelligence (AI) technologies to enable many connected intelligence services. Wireless systems can exploit the paradigm of semantic communication (SC) for facilitating informed decision-making under strict communication constraints. A novel framework called causal semantic communication (CSC) is proposed for DT-based wireless systems.
arXiv Detail & Related papers (2023-04-25T00:15:00Z)
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts. Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment. We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z)
Seq2Seq-SC: End-to-End Semantic Communication Systems with Pre-trained Language Model [20.925910474226885]
We propose a realistic semantic network called seq2seq-SC, designed to be compatible with 5G NR. We employ a performance metric called semantic similarity, measured by BLEU for lexical similarity and SBERT for semantic similarity.
arXiv Detail & Related papers (2022-10-27T07:48:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.