Applying Deep-Learning-Based Computer Vision to Wireless Communications:
Methodologies, Opportunities, and Challenges
- URL: http://arxiv.org/abs/2006.05782v4
- Date: Wed, 2 Dec 2020 12:25:26 GMT
- Title: Applying Deep-Learning-Based Computer Vision to Wireless Communications:
Methodologies, Opportunities, and Challenges
- Authors: Yu Tian and Gaofeng Pan and Mohamed-Slim Alouini
- Abstract summary: Deep learning (DL) has seen great success in the computer vision (CV) field.
This article introduces ideas about applying DL-based CV in wireless communications.
- Score: 100.45137961106069
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep learning (DL) has seen great success in the computer vision (CV) field,
and related techniques have been used in security, healthcare, remote sensing,
and many other fields. As a parallel development, visual data has become
universal in daily life, easily generated by ubiquitous low-cost cameras.
Therefore, exploring DL-based CV may yield useful information about objects,
such as their number, locations, distribution, motion, etc. Intuitively,
DL-based CV can also facilitate and improve the designs of wireless
communications, especially in dynamic network scenarios. However, so far, such
work is rare in the literature. The primary purpose of this article, then, is
to introduce ideas about applying DL-based CV in wireless communications to
bring some novel degrees of freedom to both theoretical research and
engineering applications. To illustrate how DL-based CV can be applied in
wireless communications, an example of using a DL-based CV with a
millimeter-wave (mmWave) system is given to realize optimal mmWave
multiple-input and multiple-output (MIMO) beamforming in mobile scenarios. In
this example, we propose a framework to predict future beam indices from
previously observed beam indices and images of street views using ResNet,
3-dimensional ResNext, and a long short-term memory network. The experimental
results show that our frameworks achieve much higher accuracy than the baseline
method, and that visual data can significantly improve the performance of the
MIMO beamforming system. Finally, we discuss the opportunities and challenges
of applying DL-based CV in wireless communications.
Related papers
- LaVin-DiT: Large Vision Diffusion Transformer [99.98106406059333]
LaVin-DiT is a scalable and unified foundation model designed to tackle over 20 computer vision tasks in a generative framework.
We introduce key innovations to optimize generative performance for vision tasks.
The model is scaled from 0.1B to 3.4B parameters, demonstrating substantial scalability and state-of-the-art performance across diverse vision tasks.
arXiv Detail & Related papers (2024-11-18T12:05:27Z) - VOMTC: Vision Objects for Millimeter and Terahertz Communications [29.670122146586614]
We propose a large-scale vision dataset referred to as Vision Objects for Millimeter and Terahertz Communications (VOMTC)
The VOMTC dataset consists of 20,232 pairs of RGB and depth images obtained from a camera attached to the base station (BS)
We show that the beamforming technique exploiting the VOMTC-trained object detector outperforms conventional beamforming techniques.
arXiv Detail & Related papers (2024-09-14T06:18:51Z) - Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want [58.091825321168514]
We introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.
Specifically, we propose a new end-to-end trained Multimodal Large Language Model (MLLM) that connects a vision encoder, a visual prompt encoder and an LLM.
To advance visual prompting research for MLLMs, we introduce MDVP-Data and MDVP-Bench.
arXiv Detail & Related papers (2024-03-29T16:26:20Z) - OnDev-LCT: On-Device Lightweight Convolutional Transformers towards
federated learning [29.798780069556074]
Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices.
We propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources.
arXiv Detail & Related papers (2024-01-22T02:17:36Z) - Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Multi-task Learning Approach for Modulation and Wireless Signal
Classification for 5G and Beyond: Edge Deployment via Model Compression [1.218340575383456]
Future communication networks must address the scarce spectrum to accommodate growth of heterogeneous wireless devices.
We exploit the potential of deep neural networks based multi-task learning framework to simultaneously learn modulation and signal classification tasks.
We provide a comprehensive heterogeneous wireless signals dataset for public use.
arXiv Detail & Related papers (2022-02-26T14:51:02Z) - Wireless for Machine Learning [91.13476340719087]
We give an exhaustive review of the state-of-the-art wireless methods that are specifically designed to support machine learning services over distributed datasets.
There are two clear themes within the literature, analog over-the-air computation and digital radio resource management optimized for ML.
This survey gives a comprehensive introduction to these methods, reviews the most important works, highlights open problems, and discusses application scenarios.
arXiv Detail & Related papers (2020-08-31T11:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.