Language-Oriented Communication with Semantic Coding and Knowledge
Distillation for Text-to-Image Generation
- URL: http://arxiv.org/abs/2309.11127v1
- Date: Wed, 20 Sep 2023 08:19:05 GMT
- Title: Language-Oriented Communication with Semantic Coding and Knowledge
Distillation for Text-to-Image Generation
- Authors: Hyelin Nam, Jihong Park, Jinho Choi, Mehdi Bennis, and Seong-Lyun Kim
- Abstract summary: We put forward a novel framework of language-oriented semantic communication (LSC)
In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency.
We introduce three innovative algorithms: 1) semantic source coding (SSC), which compresses a text prompt into its key head words capturing the prompt's syntactic essence; 2) semantic channel coding ( SCC), that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD), that produces listener-customized prompts via in-context learning the listener's
- Score: 53.97155730116369
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: By integrating recent advances in large language models (LLMs) and generative
models into the emerging semantic communication (SC) paradigm, in this article
we put forward to a novel framework of language-oriented semantic communication
(LSC). In LSC, machines communicate using human language messages that can be
interpreted and manipulated via natural language processing (NLP) techniques
for SC efficiency. To demonstrate LSC's potential, we introduce three
innovative algorithms: 1) semantic source coding (SSC) which compresses a text
prompt into its key head words capturing the prompt's syntactic essence while
maintaining their appearance order to keep the prompt's context; 2) semantic
channel coding (SCC) that improves robustness against errors by substituting
head words with their lenghthier synonyms; and 3) semantic knowledge
distillation (SKD) that produces listener-customized prompts via in-context
learning the listener's language style. In a communication task for progressive
text-to-image generation, the proposed methods achieve higher perceptual
similarities with fewer transmissions while enhancing robustness in noisy
communication channels.
Related papers
- Large Generative Model-assisted Talking-face Semantic Communication System [55.42631520122753]
This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system.
Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density.
Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction.
Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
arXiv Detail & Related papers (2024-11-06T12:45:46Z) - Generative Semantic Communication for Text-to-Speech Synthesis [39.8799066368712]
This paper develops a novel generative semantic communication framework for text-to-speech synthesis.
We employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead.
arXiv Detail & Related papers (2024-10-04T14:18:31Z) - Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge.
This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition.
We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z) - Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission.
Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility.
We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z) - Visual Language Model based Cross-modal Semantic Communication Systems [42.321208020228894]
We propose a novel Vision-Language Model-based Cross-modal Semantic Communication system.
The VLM-CSC comprises three novel components.
The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system.
arXiv Detail & Related papers (2024-05-06T08:59:16Z) - Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model [11.160802635050866]
Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings.
Existing CS generation methods are fragile and prone to poor performance due to template-based statistical models.
We propose a novel Gloss-prompted Diffusion-based CS Gesture generation framework (called GlossDiff)
arXiv Detail & Related papers (2024-04-30T05:54:40Z) - Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced
Code-Switching Speech Recognition [5.3545957730615905]
We introduce language identification information into the middle layer of the ASR model's encoder.
We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.
arXiv Detail & Related papers (2023-12-15T07:46:35Z) - Generative AI-aided Joint Training-free Secure Semantic Communications
via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding.
In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.