Related papers: Wireless Deep Video Semantic Transmission

Wireless Deep Video Semantic Transmission

URL: http://arxiv.org/abs/2205.13129v1
Date: Thu, 26 May 2022 03:26:43 GMT
Title: Wireless Deep Video Semantic Transmission
Authors: Sixian Wang, Jincheng Dai, Zijian Liang, Kai Niu, Zhongwei Si, Chao Dong, Xiaoqi Qin, Ping Zhang
Abstract summary: We propose a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels. Our framework is collected under the name deep video semantic transmission (DVST)
Score: 14.071114007641313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we design a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels. The proposed methods exploit nonlinear transform and conditional coding architecture to adaptively extract semantic features across video frames, and transmit semantic feature domain representations over wireless channels via deep joint source-channel coding. Our framework is collected under the name deep video semantic transmission (DVST). In particular, benefiting from the strong temporal prior provided by the feature domain context, the learned nonlinear transform function becomes temporally adaptive, resulting in a richer and more accurate entropy model guiding the transmission of current frame. Accordingly, a novel rate adaptive transmission mechanism is developed to customize deep joint source-channel coding for video sources. It learns to allocate the limited channel bandwidth within and among video frames to maximize the overall transmission performance. The whole DVST design is formulated as an optimization problem whose goal is to minimize the end-to-end transmission rate-distortion performance under perceptual quality metrics or machine vision task performance metrics. Across standard video source test sequences and various communication scenarios, experiments show that our DVST can generally surpass traditional wireless video coded transmission schemes. The proposed DVST framework can well support future semantic communications due to its video content-aware and machine vision task integration abilities.

Related papers

WVSC: Wireless Video Semantic Communication with Multi-frame Compensation [56.63352157833874]
Existing wireless video transmission schemes directly conduct video coding in pixel level. We propose a wireless video semantic communication framework, abbreviated as WVSC, which integrates the idea of semantic communication into wireless video transmission scenarios.
arXiv Detail & Related papers (2025-03-27T06:27:15Z)
Vision Transformer Based Semantic Communications for Next Generation Wireless Networks [3.8095664680229935]
This paper presents a Vision Transformer (ViT)-based semantic communication framework. By equipping ViT as the encoder-decoder framework, the proposed architecture can proficiently encode images into a high semantic content. The architecture based on the proposed ViT network achieves the Peak Signal-versato-noise Ratio (PSNR) of 38 dB.
arXiv Detail & Related papers (2025-03-21T16:23:02Z)
Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model [55.71885688565501]
We propose a scalable generative video semantic communication framework that extracts and transmits semantic information to achieve high-quality video reconstruction. Specifically, at the transmitter, description and other condition signals are extracted from the source video, functioning as text and structural semantics, respectively. At the receiver, the diffusion-based GenAI large models are utilized to fuse the semantics of the multiple modalities for reconstructing the video.
arXiv Detail & Related papers (2025-02-19T15:59:07Z)
Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture. A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions. Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z)
Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks [12.180483357502293]
This paper proposes a novel framework for real-time adaptivebitrate video streaming by integrating Latent Diffusion Models (LDMs) within the FF techniques.<n>The proposed approach leverages LDMs to compress I-frames into a latent space, offering significant storage and semantic transmission savings.<n>This work opens new possibilities for scalable real-time video streaming in 5G and future post-5G networks.
arXiv Detail & Related papers (2025-02-08T21:14:28Z)
Cross-Layer Encrypted Semantic Communication Framework for Panoramic Video Transmission [11.438045765196332]
We propose a cross-layer encrypted semantic communication (CLESC) framework for panoramic video transmission. We propose an adaptive cross-layer transmission mechanism that dynamically adjusts CRC, channel coding, and retransmission schemes based on the importance of semantic information. Compared to traditional cross-layer transmission schemes, the CLESC framework can reduce bandwidth consumption by 85%.
arXiv Detail & Related papers (2024-11-19T07:18:38Z)
When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding. During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes. Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z)
Object-Attribute-Relation Representation based Video Semantic Communication [35.87160453583808]
We introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding. We utilize OAR sequences for both low bit-rate representation and generative video reconstruction. Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance.
arXiv Detail & Related papers (2024-06-15T02:19:31Z)
Cross-layer scheme for low latency multiple description video streaming over Vehicular Ad-hoc NETworks (VANETs) [2.2124180701409233]
HEVC standard is very promising for real-time video streaming. New state-of-the-art video coding (HEVC) standard is very promising for real-time video streaming. We propose an original cross-layer system in order to enhance received video quality in vehicular communications.
arXiv Detail & Related papers (2023-11-05T14:34:58Z)
Communication-Efficient Framework for Distributed Image Semantic Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices. Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator. Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z)
VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z)
Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image. The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z)
A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs) The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved. We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z)
Nonlinear Transform Source-Channel Coding for Semantic Communications [7.81628437543759]
We propose a new class of high-efficient deep joint source-channel coding methods that can closely adapt to the source distribution under the nonlinear transform. Our model incorporates the nonlinear transform as a strong prior to effectively extract the source semantic features. Notably, the proposed NTSCC method can potentially support future semantic communications due to its vigorous content-aware ability.
arXiv Detail & Related papers (2021-12-21T03:30:46Z)
DeepWiVe: Deep-Learning-Aided Wireless Video Transmission [0.0]
We present DeepWiVe, the first-ever end-to-end joint source-channel coding (JSCC) video transmission scheme. We use deep neural networks (DNNs) to map video signals to channel symbols, combining video compression, channel coding, and modulation steps into a single neural transform. Our results show that DeepWiVe can overcome the cliff-effect, which is prevalent in conventional separation-based digital communication schemes.
arXiv Detail & Related papers (2021-11-25T11:34:24Z)
An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding. We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern. By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.