Goal-Oriented Semantic Communication for Wireless Visual Question Answering
- URL: http://arxiv.org/abs/2411.02452v2
- Date: Wed, 27 Nov 2024 11:52:08 GMT
- Title: Goal-Oriented Semantic Communication for Wireless Visual Question Answering
- Authors: Sige Liu, Nan Li, Yansha Deng, Tony Q. S. Quek,
- Abstract summary: We propose a goal-oriented semantic communication (GSC) framework to improve Visual Question Answering (VQA) performance.<n>We propose a bounding box (BBox)-based image semantic extraction and ranking approach to prioritize the semantic information based on the goal of questions.<n> Experimental results demonstrate that our GSC framework improves answering accuracy by up to 49% under AWGN channels and 59% under Rayleigh channels.
- Score: 68.75814200517854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid progress of artificial intelligence (AI) and computer vision (CV) has facilitated the development of computation-intensive applications like Visual Question Answering (VQA), which integrates visual perception and natural language processing to generate answers. To overcome the limitations of traditional VQA constrained by local computation resources, edge computing has been incorporated to provide extra computation capability at the edge side. Meanwhile, this brings new communication challenges between the local and edge, including limited bandwidth, channel noise, and multipath effects, which degrade VQA performance and user quality of experience (QoE), particularly during the transmission of large high-resolution images. To overcome these bottlenecks, we propose a goal-oriented semantic communication (GSC) framework that focuses on effectively extracting and transmitting semantic information most relevant to the VQA goals, improving the answering accuracy and enhancing the effectiveness and efficiency. The objective is to maximize the answering accuracy, and we propose a bounding box (BBox)-based image semantic extraction and ranking approach to prioritize the semantic information based on the goal of questions. We then extend it by incorporating a scene graphs (SG)-based approach to handle questions with complex relationships. Experimental results demonstrate that our GSC framework improves answering accuracy by up to 49% under AWGN channels and 59% under Rayleigh channels while reducing total latency by up to 65% compared to traditional bit-oriented transmission.
Related papers
- Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge Networks [4.880664732766839]
Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge.
Running DFL on top of edge networks, however, faces severe performance challenges due to the extensive parameter exchanges between agents.
We jointly design the communication scheme for the overlay network formed by the agents and the mixing matrix that controls the communication demands between the agents.
Our evaluations show that the proposed algorithm can reduce the total training time by over $80%$ compared to the baseline.
arXiv Detail & Related papers (2025-04-16T15:56:57Z) - Fine-Grained Retrieval-Augmented Generation for Visual Question Answering [12.622529359686016]
Visual Question Answering (VQA) focuses on providing answers to natural language questions by utilizing information from images.
Retrieval-augmented generation (RAG) leveraging external knowledge bases (KBs) emerges as a promising approach.
This study presents fine-grained knowledge units, which merge textual snippets with entity images stored in vector databases.
arXiv Detail & Related papers (2025-02-28T11:25:38Z) - AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence [65.29835430845893]
We propose a framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication.
A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters.
We show that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of samples by up to 52 percent.
arXiv Detail & Related papers (2025-02-14T14:56:58Z) - Communication Efficient Cooperative Edge AI via Event-Triggered Computation Offloading [34.18100643343979]
We propose a channel-triggered, event-triggered edge-inference framework that prioritizes efficient rare-event processing.
The proposed framework achieves superior rare-event classification accuracy, and also effectively reduces communication overhead, as opposed to existing edge-inference approaches.
arXiv Detail & Related papers (2025-01-01T15:55:59Z) - Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.
Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z) - AI Flow at the Network Edge [58.31090055138711]
AI Flow is a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers.
This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow.
arXiv Detail & Related papers (2024-11-19T12:51:17Z) - Semantic Communication based on Large Language Model for Underwater Image Transmission [36.56805696235768]
Traditional underwater communication faces limitations like low bandwidth, high latency, and susceptibility to noise.
We propose a novel Semantic Communication framework based on Large Language Models (LLMs)
Our framework reduces the overall data size to 0.8% of the original.
arXiv Detail & Related papers (2024-08-08T16:46:14Z) - Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency [59.15544887307901]
Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission.
Existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility.
We propose a novel trustworthy ISC framework that employs Generative Artificial Intelligence (GenAI) for multiple downstream inference tasks.
arXiv Detail & Related papers (2024-08-07T14:32:36Z) - Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck [28.661084093544684]
We propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework.
The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization.
We show that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff.
arXiv Detail & Related papers (2024-05-15T17:07:55Z) - RIS-Based On-the-Air Semantic Communications -- a Diffractional Deep
Neural Network Approach [10.626169088908867]
Current AI-based semantic communication methods require digital hardware for implementation.
RIS-based semantic communications offer appealing features, such as light-speed computation, low computational power requirements, and the ability to handle multiple tasks simultaneously.
arXiv Detail & Related papers (2023-12-01T12:15:49Z) - Generative AI-aided Joint Training-free Secure Semantic Communications
via Multi-modal Prompts [89.04751776308656]
This paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding.
In response to security concerns, we introduce the application of covert communications aided by a friendly jammer.
arXiv Detail & Related papers (2023-09-05T23:24:56Z) - Communication-Efficient Framework for Distributed Image Semantic
Wireless Transmission [68.69108124451263]
Federated learning-based semantic communication (FLSC) framework for multi-task distributed image transmission with IoT devices.
Each link is composed of a hierarchical vision transformer (HVT)-based extractor and a task-adaptive translator.
Channel state information-based multiple-input multiple-output transmission module designed to combat channel fading and noise.
arXiv Detail & Related papers (2023-08-07T16:32:14Z) - Visual Question Answering in Remote Sensing with Cross-Attention and
Multimodal Information Bottleneck [14.719648367178259]
We deal with the problem of visual question answering (VQA) in remote sensing.
While remotely sensed images contain information significant for the task of identification and object detection, they pose a great challenge in their processing because of high dimensionality, volume and redundancy.
We propose a cross attention based approach combined with information. The CNN-LSTM based cross-attention highlights the information in the image and language modalities and establishes a connection between the two, while information learns a low dimensional layer, that has all the relevant information required to carry out the VQA task.
arXiv Detail & Related papers (2023-06-25T15:09:21Z) - Task-Oriented Integrated Sensing, Computation and Communication for
Wireless Edge AI [46.61358701676358]
Edge artificial intelligence (AI) has been proposed to provide high-performance computation of a conventional cloud down to the network edge.
Recently, convergence of wireless sensing, computation and communication (SC$2$) for specific edge AI tasks, has aroused paradigm shift.
It is paramount importance to advance fully integrated sensing, computation and communication (I SCC) to achieve ultra-reliable and low-latency edge intelligence acquisition.
arXiv Detail & Related papers (2023-06-11T06:40:51Z) - Semantic Communication Enabling Robust Edge Intelligence for
Time-Critical IoT Applications [87.05763097471487]
This paper aims to design robust Edge Intelligence using semantic communication for time-critical IoT applications.
We analyze the effect of image DCT coefficients on inference accuracy and propose the channel-agnostic effectiveness encoding for offloading.
arXiv Detail & Related papers (2022-11-24T20:13:17Z) - VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks
for Visual Question Answering [79.22069768972207]
We propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations.
Specifically, we inter-connect the scene graph and the concept graph through a super node that represents the QA context.
On two challenging VQA tasks, our method outperforms strong baseline VQA methods by 3.2% on VCR and 4.6% on GQA, suggesting its strength in performing concept-level reasoning.
arXiv Detail & Related papers (2022-05-23T17:55:34Z) - Common Language for Goal-Oriented Semantic Communications: A Curriculum
Learning Framework [66.81698651016444]
A comprehensive semantic communications framework is proposed for enabling goal-oriented task execution.
A novel top-down framework that combines curriculum learning (CL) and reinforcement learning (RL) is proposed to solve this problem.
Simulation results show that the proposed CL method outperforms traditional RL in terms of convergence time, task execution time, and transmission cost during training.
arXiv Detail & Related papers (2021-11-15T19:13:55Z) - Learning Task-Oriented Communication for Edge Inference: An Information
Bottleneck Approach [3.983055670167878]
A low-end edge device transmits the extracted feature vector of a local data sample to a powerful edge server for processing.
It is critical to encode the data into an informative and compact representation for low-latency inference given the limited bandwidth.
We propose a learning-based communication scheme that jointly optimize feature extraction, source coding, and channel coding.
arXiv Detail & Related papers (2021-02-08T12:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.