Related papers: Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance

Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance

URL: http://arxiv.org/abs/2405.10391v1
Date: Thu, 16 May 2024 18:36:43 GMT
Title: Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance
Authors: Anish Bhattacharya, Nishanth Rao, Dhruv Parikh, Pratik Kunapuli, Nikolai Matni, Vijay Kumar,
Abstract summary: We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
Score: 8.886178310295243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.

Related papers

Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos [58.156141601478794]
Multi-object tracking (UAVT) aims to track multiple objects while maintaining consistent identities across frames of a given video.<n>Existing methods typically model motion cues and appearance separately, overlooking their interplay and resulting in suboptimal tracking performance.<n>We propose AMOT, which exploits appearance and motion cues through two key components: an Appearance-Motion Consistency (AMC) matrix and a Motion-aware Track Continuation (MTC) module.
arXiv Detail & Related papers (2025-08-03T12:06:47Z)
A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations. We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT. We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z)
Monocular Obstacle Avoidance Based on Inverse PPO for Fixed-wing UAVs [29.207513994002202]
Fixed-wing Unmanned Aerial Vehicles (UAVs) are one of the most commonly used platforms for the Low-altitude Economy (LAE) and Urban Air Mobility (UAM) Classical obstacle avoidance systems, which rely on prior maps or sophisticated sensors, face limitations in unknown low-altitude environments and small UAV platforms. This paper proposes a lightweight deep reinforcement learning (DRL) based UAV collision avoidance system.
arXiv Detail & Related papers (2024-11-27T03:03:37Z)
Vision-based control for landing an aerial vehicle on a marine vessel [0.0]
This work addresses the landing problem of an aerial vehicle, exemplified by a simple quadrotor, on a moving platform using image-based visual servo control. The image features on the textured target plane are exploited to derive a vision-based control law. The proposed control law guarantees convergence without estimating the unknown distance between the target and the moving platform.
arXiv Detail & Related papers (2024-04-17T12:53:57Z)
FullLoRA-AT: Efficiently Boosting the Robustness of Pretrained Vision Transformers [61.48709409150777]
Vision Transformer (ViT) model has gradually become mainstream in various computer vision tasks. Existing large models tend to prioritize performance during training, potentially neglecting the robustness. We develop a novel LNLoRA module, incorporating a learnable layer normalization before the conventional LoRA module. We propose the FullLoRA-AT framework by integrating the learnable LNLoRA modules into all key components of ViT-based models.
arXiv Detail & Related papers (2024-01-03T14:08:39Z)
Kinematically-Decoupled Impedance Control for Fast Object Visual Servoing and Grasping on Quadruped Manipulators [18.279073092727025]
We propose a control pipeline for SAG (Searching, Approaching, and Grasping) of objects, based on a decoupled arm kinematic chain and impedance control. The kinematic decoupling allows for fast end-effector motions and recovery that leads to robust visual servoing. We demonstrate the performance and robustness of the proposed approach with various experiments on our 140 kg HyQReal quadruped robot equipped with a 7-DoF manipulator arm.
arXiv Detail & Related papers (2023-07-10T21:51:06Z)
DADFNet: Dual Attention and Dual Frequency-Guided Dehazing Network for Video-Empowered Intelligent Transportation [79.18450119567315]
Adverse weather conditions pose severe challenges for video-based transportation surveillance. We propose a dual attention and dual frequency-guided dehazing network (termed DADFNet) for real-time visibility enhancement.
arXiv Detail & Related papers (2023-04-19T11:55:30Z)
SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking. The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation. A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z)
An Extendable, Efficient and Effective Transformer-based Object Detector [95.06044204961009]
We integrate Vision and Detection Transformers (ViDT) to construct an effective and efficient object detector. ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector. We extend it to ViDT+ to support joint-task learning for object detection and instance segmentation.
arXiv Detail & Related papers (2022-04-17T09:27:45Z)
Blending Anti-Aliasing into Vision Transformer [57.88274087198552]
discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps. Aliasing effect occurs when discrete patterns are used to produce high frequency or continuous information, resulting in the indistinguishable distortions. We propose a plug-and-play Aliasing-Reduction Module(ARM) to alleviate the aforementioned issue.
arXiv Detail & Related papers (2021-10-28T14:30:02Z)
ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection. vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z)
Evolved Neuromorphic Control for High Speed Divergence-based Landings of MAVs [0.0]
We develop spiking neural networks for controlling landings of micro air vehicles. We demonstrate that the resulting neuromorphic controllers transfer robustly from a simulation to the real world. To the best of our knowledge, this work is the first to integrate spiking neural networks in the control loop of a real-world flying robot.
arXiv Detail & Related papers (2020-03-06T10:19:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.