Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV
- URL: http://arxiv.org/abs/2406.09260v1
- Date: Thu, 13 Jun 2024 16:01:22 GMT
- Title: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV
- Authors: Maneesha Wickramasuriya, Taeyoung Lee, Murray Snyder,
- Abstract summary: A Transformer Neural Network model is trained to detect 2D keypoints and estimate the 6D pose of each part.
The method has potential applications for ship-based autonomous UAV landing and navigation.
- Score: 0.23408308015481663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a deep transformer network for estimating the relative 6D pose of a Unmanned Aerial Vehicle (UAV) with respect to a ship using monocular images. A synthetic dataset of ship images is created and annotated with 2D keypoints of multiple ship parts. A Transformer Neural Network model is trained to detect these keypoints and estimate the 6D pose of each part. The estimates are integrated using Bayesian fusion. The model is tested on synthetic data and in-situ flight experiments, demonstrating robustness and accuracy in various lighting conditions. The position estimation error is approximately 0.8\% and 1.0\% of the distance to the ship for the synthetic data and the flight experiments, respectively. The method has potential applications for ship-based autonomous UAV landing and navigation.
Related papers
- XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis [84.23233209017192]
This paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations.
The dataset is unique as it includes testing images captured by deviating from the training trajectory by 1-4 meters.
We establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multi-camera settings.
arXiv Detail & Related papers (2024-06-26T14:00:21Z) - Low-power Ship Detection in Satellite Images Using Neuromorphic Hardware [1.4330085996657045]
On-board data processing can identify ships and reduce the amount of data sent to the ground.
Most images captured on board contain only bodies of water or land, with the Airbus Ship Detection dataset showing only 22.1% of images containing ships.
We designed a low-power, two-stage system to optimize performance instead of relying on a single complex model.
arXiv Detail & Related papers (2024-06-17T08:36:12Z) - Image and AIS Data Fusion Technique for Maritime Computer Vision
Applications [1.482087972733629]
We develop a technique that fuses Automatic Identification System (AIS) data with vessels detected in images to create datasets.
Our approach associates detected ships to their corresponding AIS messages by estimating distance and azimuth.
This technique is useful for creating datasets for waterway traffic management, encounter detection, and surveillance.
arXiv Detail & Related papers (2023-12-07T20:54:49Z) - Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint.
Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance.
We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z) - Depthformer : Multiscale Vision Transformer For Monocular Depth
Estimation With Local Global Information Fusion [6.491470878214977]
This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset.
We propose a novel attention-based architecture, Depthformer for monocular depth estimation.
Our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE)
arXiv Detail & Related papers (2022-07-10T20:49:11Z) - Augmented Imagefication: A Data-driven Fault Detection Method for
Aircraft Air Data Sensors [12.317152569123541]
A novel data-driven approach named Augmented Imagefication for Fault detection (FD) of aircraft air data sensors (ADS) is proposed.
An online FD scheme on edge device based on deep neural network (DNN) is developed and the real time monitoring of aircraft is achieved.
arXiv Detail & Related papers (2022-06-18T00:06:53Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - End-to-End Trainable Multi-Instance Pose Estimation with Transformers [68.93512627479197]
We propose a new end-to-end trainable approach for multi-instance pose estimation by combining a convolutional neural network with a transformer.
Inspired by recent work on end-to-end trainable object detection with transformers, we use a transformer encoder-decoder architecture together with a bipartite matching scheme to directly regress the pose of all individuals in a given image.
Our model, called POse Estimation Transformer (POET), is trained using a novel set-based global loss that consists of a keypoint loss, a keypoint visibility loss, a center loss and a class loss.
arXiv Detail & Related papers (2021-03-22T18:19:22Z) - Deep Learning based Multi-Modal Sensing for Tracking and State
Extraction of Small Quadcopters [3.019035926889528]
This paper proposes a multi-sensor based approach to detect, track, and localize a quadcopter unmanned aerial vehicle (UAV)
Specifically, a pipeline is developed to process monocular RGB and thermal video (captured from a fixed platform) to detect and track the UAV in our FoV.
A 2D planar lidar is used to allow conversion of pixel data to actual distance measurements, and thereby enable localization of the UAV in global coordinates.
arXiv Detail & Related papers (2020-12-08T23:59:48Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Real-Time target detection in maritime scenarios based on YOLOv3 model [65.35132992156942]
A novel ships dataset is proposed consisting of more than 56k images of marine vessels collected by means of web-scraping.
A YOLOv3 single-stage detector based on Keras API is built on top of this dataset.
arXiv Detail & Related papers (2020-02-10T15:25:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.