Related papers: Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation

Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation

URL: http://arxiv.org/abs/2309.11127v1
Date: Wed, 20 Sep 2023 08:19:05 GMT
Title: Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation
Authors: Hyelin Nam, Jihong Park, Jinho Choi, Mehdi Bennis, and Seong-Lyun Kim
Abstract summary: We put forward a novel framework of language-oriented semantic communication (LSC) In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency. We introduce three innovative algorithms: 1) semantic source coding (SSC), which compresses a text prompt into its key head words capturing the prompt's syntactic essence; 2) semantic channel coding ( SCC), that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD), that produces listener-customized prompts via in-context learning the listener's
Score: 53.97155730116369
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: By integrating recent advances in large language models (LLMs) and generative models into the emerging semantic communication (SC) paradigm, in this article we put forward to a novel framework of language-oriented semantic communication (LSC). In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency. To demonstrate LSC's potential, we introduce three innovative algorithms: 1) semantic source coding (SSC) which compresses a text prompt into its key head words capturing the prompt's syntactic essence while maintaining their appearance order to keep the prompt's context; 2) semantic channel coding (SCC) that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD) that produces listener-customized prompts via in-context learning the listener's language style. In a communication task for progressive text-to-image generation, the proposed methods achieve higher perceptual similarities with fewer transmissions while enhancing robustness in noisy communication channels.

Related papers

Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio [52.859261069569165]
We propose the first unified framework capable of handling diverse combinations of sign language, lip movements, and audio for spoken-language text generation.<n>We focus on three main objectives: (i) designing a unified, modality-agnostic architecture capable of effectively processing heterogeneous inputs; (ii) exploring the underexamined synergy among modalities, particularly the role of lip movements as non-manual cues in sign language comprehension; and (iii) achieving performance on par with or better than state-of-the-art models specialized for individual tasks.
arXiv Detail & Related papers (2025-08-28T06:51:42Z)
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models [70.56468982313834]
We propose ProsodyLM, which introduces a simple tokenization scheme amenable to learning prosody.<n>We find that ProsodyLM can learn surprisingly diverse emerging prosody processing capabilities through pre-training alone.
arXiv Detail & Related papers (2025-07-27T00:59:01Z)
Large Generative Model-assisted Talking-face Semantic Communication System [55.42631520122753]
This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system. Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density. Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction. Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
arXiv Detail & Related papers (2024-11-06T12:45:46Z)
Generative Semantic Communication for Text-to-Speech Synthesis [39.8799066368712]
This paper develops a novel generative semantic communication framework for text-to-speech synthesis. We employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead.
arXiv Detail & Related papers (2024-10-04T14:18:31Z)
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge. This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition. We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z)
Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z)
Visual Language Model based Cross-modal Semantic Communication Systems [42.321208020228894]
We propose a novel Vision-Language Model-based Cross-modal Semantic Communication system. The VLM-CSC comprises three novel components. The experimental simulations validate the effectiveness, adaptability, and robustness of the CSC system.
arXiv Detail & Related papers (2024-05-06T08:59:16Z)
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model [11.160802635050866]
Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings. Existing CS generation methods are fragile and prone to poor performance due to template-based statistical models. We propose a novel Gloss-prompted Diffusion-based CS Gesture generation framework (called GlossDiff)
arXiv Detail & Related papers (2024-04-30T05:54:40Z)
Leveraging Language ID to Calculate Intermediate CTC Loss for Enhanced Code-Switching Speech Recognition [5.3545957730615905]
We introduce language identification information into the middle layer of the ASR model's encoder. We aim to generate acoustic features that imply language distinctions in a more implicit way, reducing the model's confusion when dealing with language switching.
arXiv Detail & Related papers (2023-12-15T07:46:35Z)
Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z)
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts. Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment. We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.