Vision-based Learning for Drones: A Survey
- URL: http://arxiv.org/abs/2312.05019v2
- Date: Tue, 2 Jan 2024 06:13:16 GMT
- Title: Vision-based Learning for Drones: A Survey
- Authors: Jiaping Xiao, Rangya Zhang, Yuhang Zhang, and Mir Feroskhan
- Abstract summary: Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning.
This review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities.
We explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios.
- Score: 1.280979348722635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drones as advanced cyber-physical systems are undergoing a transformative
shift with the advent of vision-based learning, a field that is rapidly gaining
prominence due to its profound impact on drone autonomy and functionality.
Different from existing task-specific surveys, this review offers a
comprehensive overview of vision-based learning in drones, emphasizing its
pivotal role in enhancing their operational capabilities under various
scenarios. We start by elucidating the fundamental principles of vision-based
learning, highlighting how it significantly improves drones' visual perception
and decision-making processes. We then categorize vision-based control methods
into indirect, semi-direct, and end-to-end approaches from the
perception-control perspective. We further explore various applications of
vision-based drones with learning capabilities, ranging from single-agent
systems to more complex multi-agent and heterogeneous system scenarios, and
underscore the challenges and innovations characterizing each area. Finally, we
explore open questions and potential solutions, paving the way for ongoing
research and development in this dynamic and rapidly evolving field. With
growing large language models (LLMs) and embodied intelligence, vision-based
learning for drones provides a promising but challenging road towards
artificial general intelligence (AGI) in 3D physical world.
Related papers
- A Comprehensive Review of 3D Object Detection in Autonomous Driving: Technological Advances and Future Directions [11.071271817366739]
3D object perception has become a crucial component in the development of autonomous driving systems.
This review extensively summarizes traditional 3D object detection methods, focusing on camera-based, LiDAR-based, and fusion detection techniques.
We discuss future directions, including methods to improve accuracy such as temporal perception, occupancy grids, and end-to-end learning frameworks.
arXiv Detail & Related papers (2024-08-28T01:08:33Z) - A Survey of Embodied Learning for Object-Centric Robotic Manipulation [27.569063968870868]
Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in AI.
Unlike data-driven machine learning methods, embodied learning focuses on robot learning through physical interaction with the environment.
arXiv Detail & Related papers (2024-08-21T11:32:09Z) - A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning.
Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability.
VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z) - MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions.
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z) - A Survey on Robotics with Foundation Models: toward Embodied AI [30.999414445286757]
Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
arXiv Detail & Related papers (2024-02-04T07:55:01Z) - Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone
Racing [52.50284630866713]
Existing systems often require hand-engineered components for state estimation, planning, and control.
This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies.
arXiv Detail & Related papers (2022-10-26T19:03:17Z) - Deep Learning for Omnidirectional Vision: A Survey and New Perspectives [7.068031114801553]
This paper presents a systematic and comprehensive review and analysis of the recent progress in deep learning methods for omnidirectional vision.
Our work covers four main contents: (i) An introduction to the principle of omnidirectional imaging, the convolution methods on the ODI, and datasets to highlight the differences and difficulties compared with the 2D planar image data; (ii) A structural and hierarchical taxonomy of the DL methods for omnidirectional vision; and (iii) A summarization of the latest novel learning strategies and applications.
arXiv Detail & Related papers (2022-05-21T00:19:56Z) - The State of Aerial Surveillance: A Survey [62.198765910573556]
This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective.
The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed.
arXiv Detail & Related papers (2022-01-09T20:13:27Z) - Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.
This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.