VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
- URL: http://arxiv.org/abs/2406.18538v1
- Date: Fri, 17 May 2024 06:11:10 GMT
- Title: VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
- Authors: Jiangyuan Guo, Wei Chen, Yuxuan Sun, Jialong Xu, Bo Ai,
- Abstract summary: We propose an end-to-end SC system for video question answering tasks called VideoQA-SC.
Our goal is to accomplish VideoQA tasks directly based on video semantics over noisy or fading wireless channels.
Our results show the great potential of task-oriented SC system design for video applications.
- Score: 21.0279034601774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although semantic communication (SC) has shown its potential in efficiently transmitting multi-modal data such as text, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth efficiency and real-time performance of various intelligent tasks. The difficulty in such system design lies in the extraction of task-related compact semantic representations and their accurate delivery over noisy channels. In this paper, we propose an end-to-end SC system for video question answering (VideoQA) tasks called VideoQA-SC. Our goal is to accomplish VideoQA tasks directly based on video semantics over noisy or fading wireless channels, bypassing the need for video reconstruction at the receiver. To this end, we develop a spatiotemporal semantic encoder for effective video semantic extraction, and a learning-based bandwidth-adaptive deep joint source-channel coding (DJSCC) scheme for efficient and robust video semantic transmission. Experiments demonstrate that VideoQA-SC outperforms traditional and advanced DJSCC-based SC systems that rely on video reconstruction at the receiver under a wide range of channel conditions and bandwidth constraints. In particular, when the signal-to-noise ratio is low, VideoQA-SC can improve the answer accuracy by 5.17% while saving almost 99.5% of the bandwidth at the same time, compared with the advanced DJSCC-based SC system. Our results show the great potential of task-oriented SC system design for video applications.
Related papers
- Object-Attribute-Relation Representation based Video Semantic Communication [35.87160453583808]
We introduce the use of object-attribute-relation (OAR) as a semantic framework for videos to facilitate low bit-rate coding.
We utilize OAR sequences for both low bit-rate representation and generative video reconstruction.
Our experiments on traffic surveillance video datasets assess the effectiveness of our approach in terms of video transmission performance.
arXiv Detail & Related papers (2024-06-15T02:19:31Z) - Attention-based UNet enabled Lightweight Image Semantic Communication
System over Internet of Things [4.62215026195301]
We study the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices.
We propose an attention-based UNet enabled lightweight image semantic communication (LSSC) system, which achieves low computational complexity and small model size.
arXiv Detail & Related papers (2024-01-14T16:46:50Z) - Cross-layer scheme for low latency multiple description video streaming
over Vehicular Ad-hoc NETworks (VANETs) [2.2124180701409233]
HEVC standard is very promising for real-time video streaming.
New state-of-the-art video coding (HEVC) standard is very promising for real-time video streaming.
We propose an original cross-layer system in order to enhance received video quality in vehicular communications.
arXiv Detail & Related papers (2023-11-05T14:34:58Z) - Streaming Video Model [90.24390609039335]
We propose to unify video understanding tasks into one streaming video architecture, referred to as Streaming Vision Transformer (S-ViT)
S-ViT first produces frame-level features with a memory-enabled temporally-aware spatial encoder to serve frame-based video tasks.
The efficiency and efficacy of S-ViT is demonstrated by the state-of-the-art accuracy in the sequence-based action recognition.
arXiv Detail & Related papers (2023-03-30T08:51:49Z) - Wireless Deep Video Semantic Transmission [14.071114007641313]
We propose a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels.
Our framework is collected under the name deep video semantic transmission (DVST)
arXiv Detail & Related papers (2022-05-26T03:26:43Z) - Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices [59.86658316440461]
We present a robust and low complexity system for Acoustic Scene Classification (ASC)
We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue.
To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction.
arXiv Detail & Related papers (2022-03-23T10:27:41Z) - VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge
transfer from voice conversion [77.50171525265056]
This paper proposes a novel multi-speaker Video-to-Speech (VTS) system based on cross-modal knowledge transfer from voice conversion (VC)
The Lip2Ind network can substitute the content encoder of VC to form a multi-speaker VTS system to convert silent video to acoustic units for reconstructing accurate spoken content.
arXiv Detail & Related papers (2022-02-18T08:58:45Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - DeepWiVe: Deep-Learning-Aided Wireless Video Transmission [0.0]
We present DeepWiVe, the first-ever end-to-end joint source-channel coding (JSCC) video transmission scheme.
We use deep neural networks (DNNs) to map video signals to channel symbols, combining video compression, channel coding, and modulation steps into a single neural transform.
Our results show that DeepWiVe can overcome the cliff-effect, which is prevalent in conventional separation-based digital communication schemes.
arXiv Detail & Related papers (2021-11-25T11:34:24Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.