Related papers: A comprehensive review of datasets and deep learning techniques for vision in Unmanned Surface Vehicles

A comprehensive review of datasets and deep learning techniques for vision in Unmanned Surface Vehicles

URL: http://arxiv.org/abs/2412.01461v1
Date: Mon, 02 Dec 2024 12:54:18 GMT
Title: A comprehensive review of datasets and deep learning techniques for vision in Unmanned Surface Vehicles
Authors: Linh Trinh, Siegfried Mercelis, Ali Anwar,
Abstract summary: Unmanned Surface Vehicles (USVs) have emerged as a major platform in maritime operations.<n>USVs can help reduce labor costs, increase safety, save energy, and allow for difficult unmanned tasks in harsh maritime environments.<n>With the rapid development of USVs, many vision tasks such as detection and segmentation become increasingly important.
Score: 2.9109581496560044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unmanned Surface Vehicles (USVs) have emerged as a major platform in maritime operations, capable of supporting a wide range of applications. USVs can help reduce labor costs, increase safety, save energy, and allow for difficult unmanned tasks in harsh maritime environments. With the rapid development of USVs, many vision tasks such as detection and segmentation become increasingly important. Datasets play an important role in encouraging and improving the research and development of reliable vision algorithms for USVs. In this regard, a large number of recent studies have focused on the release of vision datasets for USVs. Along with the development of datasets, a variety of deep learning techniques have also been studied, with a focus on USVs. However, there is a lack of a systematic review of recent studies in both datasets and vision techniques to provide a comprehensive picture of the current development of vision on USVs, including limitations and trends. In this study, we provide a comprehensive review of both USV datasets and deep learning techniques for vision tasks. Our review was conducted using a large number of vision datasets from USVs. We elaborate several challenges and potential opportunities for research and development in USV vision based on a thorough analysis of current datasets and deep learning techniques.

Related papers

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation [53.84282335629258]
We introduce a comprehensive fine-grained evaluation benchmark, i.e., FG-BMK, comprising 3.49 million questions and 3.32 million images. Our evaluation systematically examines LVLMs from both human-oriented and machine-oriented perspectives. We uncover key findings regarding the influence of training paradigms, modality alignment, perturbation susceptibility, and fine-grained category reasoning on task performance.
arXiv Detail & Related papers (2025-04-21T09:30:41Z)
Vision-Language Models for Edge Networks: A Comprehensive Survey [32.05172973290599]
Vision Large Language Models (VLMs) combine visual understanding with natural language processing, enabling tasks like image captioning, visual question answering, and video analysis. VLMs show impressive capabilities across domains such as autonomous vehicles, smart surveillance, and healthcare. Their deployment on resource-constrained edge devices remains challenging due to processing power, memory, and energy limitations.
arXiv Detail & Related papers (2025-02-11T14:04:43Z)
UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking [0.0]
Unmanned Aerial Vehicles (UAVs) have revolutionized the process of gathering and analyzing data in diverse research domains. UAV datasets consist of various types of data, such as satellite imagery, images captured by drones, and videos. These datasets play a crucial role in disaster damage assessment, aerial surveillance, object recognition, and tracking.
arXiv Detail & Related papers (2024-09-05T04:47:36Z)
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs [61.143381152739046]
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations. We provide model weights, code, supporting tools, datasets, and detailed instruction-tuning and evaluation recipes.
arXiv Detail & Related papers (2024-06-24T17:59:42Z)
A Comprehensive Survey on Underwater Image Enhancement Based on Deep Learning [51.7818820745221]
Underwater image enhancement (UIE) presents a significant challenge within computer vision research. Despite the development of numerous UIE algorithms, a thorough and systematic review is still absent.
arXiv Detail & Related papers (2024-05-30T04:46:40Z)
Collaborative Perception Datasets in Autonomous Driving: A Survey [0.0]
The paper systematically analyzes a variety of datasets, comparing them based on aspects such as diversity, sensor setup, quality, public availability, and their applicability to downstream tasks. The importance of addressing privacy and security concerns in the development of datasets is emphasized, regarding data sharing and dataset creation.
arXiv Detail & Related papers (2024-04-22T09:36:17Z)
SynDrone -- Multi-modal UAV Dataset for Urban Scenarios [11.338399194998933]
The scarcity of large-scale real datasets with pixel-level annotations poses a significant challenge to researchers. We propose a multimodal synthetic dataset containing both images and 3D data taken at multiple flying heights. The dataset will be made publicly available to support the development of novel computer vision methods targeting UAV applications.
arXiv Detail & Related papers (2023-08-21T06:22:10Z)
Vision-Language Models for Vision Tasks: A Survey [62.543250338410836]
Vision-Language Models (VLMs) learn rich vision-language correlation from web-scale image-text pairs. This paper provides a systematic review of visual language models for various visual recognition tasks.
arXiv Detail & Related papers (2023-04-03T02:17:05Z)
Vision-Centric BEV Perception: A Survey [92.98068828762833]
Vision-centric Bird's Eye View (BEV) perception has garnered significant interest from both industry and academia. The rapid advancements in deep learning have led to the proposal of numerous methods for addressing vision-centric BEV perception challenges. This paper compiles and organizes up-to-date knowledge, offering a systematic review and summary of prevalent algorithms.
arXiv Detail & Related papers (2022-08-04T17:53:17Z)
Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z)
Efficient Pipelines for Vision-Based Context Sensing [0.24366811507669117]
There is an emergence of vision sources deployed worldwide. The cameras could be installed on roadside, in-house, and on mobile platforms. However, the vision data collection and analytics are still highly manual today. There are three major challenges for today's vision-based context sensing systems.
arXiv Detail & Related papers (2020-11-01T05:09:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.