Wireless Deep Video Semantic Transmission
- URL: http://arxiv.org/abs/2205.13129v1
- Date: Thu, 26 May 2022 03:26:43 GMT
- Title: Wireless Deep Video Semantic Transmission
- Authors: Sixian Wang, Jincheng Dai, Zijian Liang, Kai Niu, Zhongwei Si, Chao
Dong, Xiaoqi Qin, Ping Zhang
- Abstract summary: We propose a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels.
Our framework is collected under the name deep video semantic transmission (DVST)
- Score: 14.071114007641313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we design a new class of high-efficiency deep joint
source-channel coding methods to achieve end-to-end video transmission over
wireless channels. The proposed methods exploit nonlinear transform and
conditional coding architecture to adaptively extract semantic features across
video frames, and transmit semantic feature domain representations over
wireless channels via deep joint source-channel coding. Our framework is
collected under the name deep video semantic transmission (DVST). In
particular, benefiting from the strong temporal prior provided by the feature
domain context, the learned nonlinear transform function becomes temporally
adaptive, resulting in a richer and more accurate entropy model guiding the
transmission of current frame. Accordingly, a novel rate adaptive transmission
mechanism is developed to customize deep joint source-channel coding for video
sources. It learns to allocate the limited channel bandwidth within and among
video frames to maximize the overall transmission performance. The whole DVST
design is formulated as an optimization problem whose goal is to minimize the
end-to-end transmission rate-distortion performance under perceptual quality
metrics or machine vision task performance metrics. Across standard video
source test sequences and various communication scenarios, experiments show
that our DVST can generally surpass traditional wireless video coded
transmission schemes. The proposed DVST framework can well support future
semantic communications due to its video content-aware and machine vision task
integration abilities.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Object-Attribute-Relation Representation based Video Semantic Communication [35.87160453583808]
We introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding.
We utilize OAR sequences for both low bit-rate representation and generative video reconstruction.
Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance.
arXiv Detail & Related papers (2024-06-15T02:19:31Z) - Cross-layer scheme for low latency multiple description video streaming
over Vehicular Ad-hoc NETworks (VANETs) [2.2124180701409233]
HEVC standard is very promising for real-time video streaming.
New state-of-the-art video coding (HEVC) standard is very promising for real-time video streaming.
We propose an original cross-layer system in order to enhance received video quality in vehicular communications.
arXiv Detail & Related papers (2023-11-05T14:34:58Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Nonlinear Transform Source-Channel Coding for Semantic Communications [7.81628437543759]
We propose a new class of high-efficient deep joint source-channel coding methods that can closely adapt to the source distribution under the nonlinear transform.
Our model incorporates the nonlinear transform as a strong prior to effectively extract the source semantic features.
Notably, the proposed NTSCC method can potentially support future semantic communications due to its vigorous content-aware ability.
arXiv Detail & Related papers (2021-12-21T03:30:46Z) - DeepWiVe: Deep-Learning-Aided Wireless Video Transmission [0.0]
We present DeepWiVe, the first-ever end-to-end joint source-channel coding (JSCC) video transmission scheme.
We use deep neural networks (DNNs) to map video signals to channel symbols, combining video compression, channel coding, and modulation steps into a single neural transform.
Our results show that DeepWiVe can overcome the cliff-effect, which is prevalent in conventional separation-based digital communication schemes.
arXiv Detail & Related papers (2021-11-25T11:34:24Z) - Learning Task-Oriented Communication for Edge Inference: An Information
Bottleneck Approach [3.983055670167878]
A low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing.
It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth.
We propose a learning-based communication scheme that jointly optimize feature extraction, source coding, and channel coding.
arXiv Detail & Related papers (2021-02-08T12:53:32Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.