Goal-Oriented Semantic Communication for Wireless Visual Question Answering
- URL: http://arxiv.org/abs/2411.02452v2
- Date: Wed, 27 Nov 2024 11:52:08 GMT
- Title: Goal-Oriented Semantic Communication for Wireless Visual Question Answering
- Authors: Sige Liu, Nan Li, Yansha Deng, Tony Q. S. Quek,
- Abstract summary: We propose a goal-oriented semantic communication (GSC) framework to improve Visual Question Answering (VQA) performance.
We propose a bounding box (BBox)-based image semantic extraction and ranking approach to prioritize the semantic information based on the goal of questions.
Experimental results demonstrate that our GSC framework improves answering accuracy by up to 49% under AWGN channels and 59% under Rayleigh channels.
- Score: 68.75814200517854
- License:
- Abstract: The rapid progress of artificial intelligence (AI) and computer vision (CV) has facilitated the development of computation-intensive applications like Visual Question Answering (VQA), which integrates visual perception and natural language processing to generate answers. To overcome the limitations of traditional VQA constrained by local computation resources, edge computing has been incorporated to provide extra computation capability at the edge side. Meanwhile, this brings new communication challenges between the local and edge, including limited bandwidth, channel noise, and multipath effects, which degrade VQA performance and user quality of experience (QoE), particularly during the transmission of large high-resolution images. To overcome these bottlenecks, we propose a goal-oriented semantic communication (GSC) framework that focuses on effectively extracting and transmitting semantic information most relevant to the VQA goals, improving the answering accuracy and enhancing the effectiveness and efficiency. The objective is to maximize the answering accuracy, and we propose a bounding box (BBox)-based image semantic extraction and ranking approach to prioritize the semantic information based on the goal of questions. We then extend it by incorporating a scene graphs (SG)-based approach to handle questions with complex relationships. Experimental results demonstrate that our GSC framework improves answering accuracy by up to 49% under AWGN channels and 59% under Rayleigh channels while reducing total latency by up to 65% compared to traditional bit-oriented transmission.
Related papers
- AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence [65.29835430845893]
We propose a framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication.
A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters.
We show that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of samples by up to 52 percent.
arXiv Detail & Related papers (2025-02-14T14:56:58Z) - Communication Efficient Cooperative Edge AI via Event-Triggered Computation Offloading [34.18100643343979]
We propose a channel-triggered, event-triggered edge-inference framework that prioritizes efficient rare-event processing.
The proposed framework achieves superior rare-event classification accuracy, and also effectively reduces communication overhead, as opposed to existing edge-inference approaches.
arXiv Detail & Related papers (2025-01-01T15:55:59Z) - Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.
Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z) - AI Flow at the Network Edge [58.31090055138711]
AI Flow is a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers.
This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow.
arXiv Detail & Related papers (2024-11-19T12:51:17Z) - Visual Question Answering in Remote Sensing with Cross-Attention and
Multimodal Information Bottleneck [14.719648367178259]
We deal with the problem of visual question answering (VQA) in remote sensing.
While remotely sensed images contain information significant for the task of identification and object detection, they pose a great challenge in their processing because of high dimensionality, volume and redundancy.
We propose a cross attention based approach combined with information. The CNN-LSTM based cross-attention highlights the information in the image and language modalities and establishes a connection between the two, while information learns a low dimensional layer, that has all the relevant information required to carry out the VQA task.
arXiv Detail & Related papers (2023-06-25T15:09:21Z) - Task-Oriented Integrated Sensing, Computation and Communication for
Wireless Edge AI [46.61358701676358]
Edge artificial intelligence (AI) has been proposed to provide high-performance computation of a conventional cloud down to the network edge.
Recently, convergence of wireless sensing, computation and communication (SC$2$) for specific edge AI tasks, has aroused paradigm shift.
It is paramount importance to advance fully integrated sensing, computation and communication (I SCC) to achieve ultra-reliable and low-latency edge intelligence acquisition.
arXiv Detail & Related papers (2023-06-11T06:40:51Z) - Semantic Communication Enabling Robust Edge Intelligence for
Time-Critical IoT Applications [87.05763097471487]
This paper aims to design robust Edge Intelligence using semantic communication for time-critical IoT applications.
We analyze the effect of image DCT coefficients on inference accuracy and propose the channel-agnostic effectiveness encoding for offloading.
arXiv Detail & Related papers (2022-11-24T20:13:17Z) - Enabling AI Quality Control via Feature Hierarchical Edge Inference [6.490724361345847]
This work proposes a feature hierarchical EI (FHEI) comprising feature network and inference network deployed at an edge server and corresponding mobile.
A higher scale feature requires more computation and communication loads while it provides a better AI quality.
It is verified by extensive simulations that the proposed joint communication-and-computation control on FHEI architecture always outperforms several benchmarks.
arXiv Detail & Related papers (2022-11-15T02:54:23Z) - VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
for Visual Question Answering [79.22069768972207]
We propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations.
Specifically, we inter-connect the scene graph and the concept graph through a super node that represents the QA context.
On two challenging VQA tasks, our method outperforms strong baseline VQA methods by 3.2% on VCR and 4.6% on GQA, suggesting its strength in performing concept-level reasoning.
arXiv Detail & Related papers (2022-05-23T17:55:34Z) - Coarse-to-Fine Reasoning for Visual Question Answering [18.535633096397397]
We present a new reasoning framework to fill the gap between visual features and semantic clues in the Visual Question Answering (VQA) task.
Our method first extracts the features and predicates from the image and question.
We then propose a new reasoning framework to effectively jointly learn these features and predicates in a coarse-to-fine manner.
arXiv Detail & Related papers (2021-10-06T06:29:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.