Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance
- URL: http://arxiv.org/abs/2405.10391v1
- Date: Thu, 16 May 2024 18:36:43 GMT
- Title: Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance
- Authors: Anish Bhattacharya, Nishanth Rao, Dhruv Parikh, Pratik Kunapuli, Nikolai Matni, Vijay Kumar,
- Abstract summary: We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance.
We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control.
To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
- Score: 8.886178310295243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
Related papers
- Vision-based control for landing an aerial vehicle on a marine vessel [0.0]
This work addresses the landing problem of an aerial vehicle, exemplified by a simple quadrotor, on a moving platform using image-based visual servo control.
The image features on the textured target plane are exploited to derive a vision-based control law.
The proposed control law guarantees convergence without estimating the unknown distance between the target and the moving platform.
arXiv Detail & Related papers (2024-04-17T12:53:57Z) - FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision
Transformers [61.48709409150777]
Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks.
Existing large models tend to prioritize performance during training, potentially neglecting the robustness.
We develop a novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module.
We propose the FullLoRA-AT framework by integrating the learnable LNLoRA modules into all key components of ViT-based models.
arXiv Detail & Related papers (2024-01-03T14:08:39Z) - Kinematically-Decoupled Impedance Control for Fast Object Visual
Servoing and Grasping on Quadruped Manipulators [18.279073092727025]
We propose a control pipeline for SAG (Searching, Approaching, and Grasping) of objects, based on a decoupled arm kinematic chain and impedance control.
The kinematic decoupling allows for fast end-effector motions and recovery that leads to robust visual servoing.
We demonstrate the performance and robustness of the proposed approach with various experiments on our 140 kg HyQReal quadruped robot equipped with a 7-DoF manipulator arm.
arXiv Detail & Related papers (2023-07-10T21:51:06Z) - DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for
Video-Empowered Intelligent Transportation [79.18450119567315]
Adverse weather conditions pose severe challenges for video-based transportation surveillance.
We propose a dual attention and dual frequency-guided dehazing network (termed DADFNet) for real-time visibility enhancement.
arXiv Detail & Related papers (2023-04-19T11:55:30Z) - SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking.
The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation.
A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z) - Unified Control Framework for Real-Time Interception and Obstacle Avoidance of Fast-Moving Objects with Diffusion Variational Autoencoder [2.5642257132861923]
Real-time interception of fast-moving objects by robotic arms in dynamic environments poses a formidable challenge.
This paper introduces a unified control framework to address the challenge by simultaneously intercepting dynamic objects and avoiding moving obstacles.
arXiv Detail & Related papers (2022-09-27T18:46:52Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate
Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation.
We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience.
Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - Blending Anti-Aliasing into Vision Transformer [57.88274087198552]
discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps.
Aliasing effect occurs when discrete patterns are used to produce high frequency or continuous information, resulting in the indistinguishable distortions.
We propose a plug-and-play Aliasing-Reduction Module(ARM) to alleviate the aforementioned issue.
arXiv Detail & Related papers (2021-10-28T14:30:02Z) - Evolved Neuromorphic Control for High Speed Divergence-based Landings of
MAVs [0.0]
We develop spiking neural networks for controlling landings of micro air vehicles.
We demonstrate that the resulting neuromorphic controllers transfer robustly from a simulation to the real world.
To the best of our knowledge, this work is the first to integrate spiking neural networks in the control loop of a real-world flying robot.
arXiv Detail & Related papers (2020-03-06T10:19:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.