Related papers: Integration of Communication and Computational Imaging

Integration of Communication and Computational Imaging

URL: http://arxiv.org/abs/2410.19415v2
Date: Tue, 29 Oct 2024 12:51:54 GMT
Title: Integration of Communication and Computational Imaging
Authors: Zhenming Yu, Liming Cheng, Hongyu Huang, Wei Zhang, Liang Lin, Kun Xu,
Abstract summary: We propose a novel framework that integrates communication and computational imaging (ICCI) for remote perception. ICCI framework performs a full-link information transfer optimization, aiming to minimize information loss from the generation of the information source to the execution of the final vision tasks. An 80 km 27-band hyperspectral video perception with a rate of 30 fps is experimentally achieved.
Score: 49.2442836992307
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Communication enables the expansion of human visual perception beyond the limitations of time and distance, while computational imaging overcomes the constraints of depth and breadth. Although impressive achievements have been witnessed with the two types of technologies, the occlusive information flow between the two domains is a bottleneck hindering their ulterior progression. Herein, we propose a novel framework that integrates communication and computational imaging (ICCI) to break through the inherent isolation between communication and computational imaging for remote perception. By jointly considering the sensing and transmitting of remote visual information, the ICCI framework performs a full-link information transfer optimization, aiming to minimize information loss from the generation of the information source to the execution of the final vision tasks. We conduct numerical analysis and experiments to demonstrate the ICCI framework by integrating communication systems and snapshot compressive imaging systems. Compared with straightforward combination schemes, which sequentially execute sensing and transmitting, the ICCI scheme shows greater robustness against channel noise and impairments while achieving higher data compression. Moreover, an 80 km 27-band hyperspectral video perception with a rate of 30 fps is experimentally achieved. This new ICCI remote perception paradigm offers a highefficiency solution for various real-time computer vision tasks.

Related papers

Deep Reinforcement Learning-Based User Scheduling for Collaborative Perception [24.300126250046894]
Collaborative perception is envisioned to improve perceptual accuracy by using vehicle-to-everything (V2X) communication. Due to limited communication resources, it is impractical for all units to transmit sensing data such as point clouds or high-definition video. We propose a deep reinforcement learning-based V2X user scheduling algorithm for collaborative perception.
arXiv Detail & Related papers (2025-02-12T04:45:00Z)
CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality. The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z)
Goal-Oriented Semantic Communication for Wireless Visual Question Answering [68.75814200517854]
We propose a goal-oriented semantic communication (GSC) framework to improve Visual Question Answering (VQA) performance. We propose a bounding box (BBox)-based image semantic extraction and ranking approach to prioritize the semantic information based on the goal of questions. Experimental results demonstrate that our GSC framework improves answering accuracy by up to 49% under AWGN channels and 59% under Rayleigh channels.
arXiv Detail & Related papers (2024-11-03T12:01:18Z)
Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders [6.498925999634298]
This paper presents a novel approach for communication-efficient distributed multiview detection and tracking using masked autoencoders (MAEs) We introduce a semantic-guided masking strategy that leverages pre-trained segmentation models and a tunable power function to prioritize informative image regions. We evaluate our method on both virtual and real-world multiview datasets, demonstrating comparable performance in terms of detection and tracking performance metrics.
arXiv Detail & Related papers (2024-10-07T08:06:41Z)
ECAFormer: Low-light Image Enhancement using Cross Attention [11.554554006307836]
Low-light image enhancement (LLIE) is critical in computer vision. We design a hierarchical mutual Enhancement via a Cross Attention transformer (ECAFormer) We show that ECAFormer reaches competitive performance across multiple benchmarks, yielding nearly a 3% improvement in PSNR over the suboptimal method.
arXiv Detail & Related papers (2024-06-19T07:21:31Z)
Benchmarking Semantic Communications for Image Transmission Over MIMO Interference Channels [11.108614988357008]
We propose an interference-robust semantic communication (IRSC) scheme for general multiple-input multiple-output (MIMO) interference channels. This scheme involves the development of transceivers based on neural networks (NNs), which integrate channel state information (CSI) either solely at the receiver or at both transmitter and receiver ends. Experimental results demonstrate that the proposed IRSC scheme effectively learns to mitigate interference and outperforms baseline approaches.
arXiv Detail & Related papers (2024-04-10T11:40:22Z)
Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications. Virtual reality (VR) transmission over wireless networks is data- and computation-intensive. We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame. We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information. Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z)
Semantic Communication Enabling Robust Edge Intelligence for Time-Critical IoT Applications [87.05763097471487]
This paper aims to design robust Edge Intelligence using semantic communication for time-critical IoT applications. We analyze the effect of image DCT coefficients on inference accuracy and propose the channel-agnostic effectiveness encoding for offloading.
arXiv Detail & Related papers (2022-11-24T20:13:17Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance [8.360870648463653]
Realtime and intelligent video surveillance via camera networks involve computation-intensive vision detection tasks with massive video data. Multiple video streams compete for limited communication resources on the link between edge devices and camera networks. An adaptive camera network self-configuration method (CANS) of video surveillance is proposed to cope with multiple video streams of heterogeneous quality of service.
arXiv Detail & Related papers (2021-09-13T01:54:33Z)
Multi-image Super Resolution of Remotely Sensed Images using Residual Feature Attention Deep Neural Networks [1.3764085113103222]
The presented research proposes a novel residual attention model (RAMS) that efficiently tackles the multi-image super-resolution task. We introduce the mechanism of visual feature attention with 3D convolutions in order to obtain an aware data fusion and information extraction. Our representation learning network makes extensive use of nestled residual connections to let flow redundant low-frequency signals.
arXiv Detail & Related papers (2020-07-06T22:54:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.