Related papers: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

Related papers

TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion. Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed. Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z)
Evaluation of Flight Parameters in UAV-based 3D Reconstruction for Rooftop Infrastructure Assessment [0.08192907805418585]
Rooftop 3D reconstruction using UAV-based photogrammetry offers a promising solution for infrastructure assessment. Existing methods often require high percentages of image overlap and extended flight times to ensure model accuracy when using autonomous flight paths. This study systematically evaluates key flight parameters-ground sampling distance (GSD) and image overlap-to optimize the 3D reconstruction of complex rooftop infrastructure.
arXiv Detail & Related papers (2025-04-02T19:43:20Z)
Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment [0.21427777919040414]
This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment. We present a photo-realistic 3D virtual environment leveraging recent advancements in Gaussian splatting. The resulting simulation enables the indoor testing of flight maneuvers while verifying all aspects of flight software, hardware, and the deep monocular pose estimation scheme.
arXiv Detail & Related papers (2025-02-08T02:19:42Z)
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis [84.23233209017192]
This paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations. The dataset is unique as it includes testing images captured by deviating from the training trajectory by 1-4 meters. We establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multi-camera settings.
arXiv Detail & Related papers (2024-06-26T14:00:21Z)
Low-power Ship Detection in Satellite Images Using Neuromorphic Hardware [1.4330085996657045]
On-board data processing can identify ships and reduce the amount of data sent to the ground. Most images captured on board contain only bodies of water or land, with the Airbus Ship Detection dataset showing only 22.1% of images containing ships. We designed a low-power, two-stage system to optimize performance instead of relying on a single complex model.
arXiv Detail & Related papers (2024-06-17T08:36:12Z)
Image and AIS Data Fusion Technique for Maritime Computer Vision Applications [1.482087972733629]
We develop a technique that fuses Automatic Identification System (AIS) data with vessels detected in images to create datasets. Our approach associates detected ships to their corresponding AIS messages by estimating distance and azimuth. This technique is useful for creating datasets for waterway traffic management, encounter detection, and surveillance.
arXiv Detail & Related papers (2023-12-07T20:54:49Z)
Towards Viewpoint Robustness in Bird's Eye View Segmentation [85.99907496019972]
We study how AV perception models are affected by changes in camera viewpoint. Small changes to pitch, yaw, depth, or height of the camera at inference time lead to large drops in performance. We introduce a technique for novel view synthesis and use it to transform collected data to the viewpoint of target rigs.
arXiv Detail & Related papers (2023-09-11T02:10:07Z)
Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion [6.491470878214977]
This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset. We propose a novel attention-based architecture, Depthformer for monocular depth estimation. Our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE)
arXiv Detail & Related papers (2022-07-10T20:49:11Z)
Augmented Imagefication: A Data-driven Fault Detection Method for Aircraft Air Data Sensors [12.317152569123541]
A novel data-driven approach named Augmented Imagefication for Fault detection (FD) of aircraft air data sensors (ADS) is proposed. An online FD scheme on edge device based on deep neural network (DNN) is developed and the real time monitoring of aircraft is achieved.
arXiv Detail & Related papers (2022-06-18T00:06:53Z)
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z)
End-to-End Trainable Multi-Instance Pose Estimation with Transformers [68.93512627479197]
We propose a new end-to-end trainable approach for multi-instance pose estimation by combining a convolutional neural network with a transformer. Inspired by recent work on end-to-end trainable object detection with transformers, we use a transformer encoder-decoder architecture together with a bipartite matching scheme to directly regress the pose of all individuals in a given image. Our model, called POse Estimation Transformer (POET), is trained using a novel set-based global loss that consists of a keypoint loss, a keypoint visibility loss, a center loss and a class loss.
arXiv Detail & Related papers (2021-03-22T18:19:22Z)
Deep Learning based Multi-Modal Sensing for Tracking and State Extraction of Small Quadcopters [3.019035926889528]
This paper proposes a multi-sensor based approach to detect, track, and localize a quadcopter unmanned aerial vehicle (UAV) Specifically, a pipeline is developed to process monocular RGB and thermal video (captured from a fixed platform) to detect and track the UAV in our FoV. A 2D planar lidar is used to allow conversion of pixel data to actual distance measurements, and thereby enable localization of the UAV in global coordinates.
arXiv Detail & Related papers (2020-12-08T23:59:48Z)
PerMO: Perceiving More at Once from a Single Image for Autonomous Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image. Our approach combines the strengths of deep learning and the elegance of traditional techniques. We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
Real-Time target detection in maritime scenarios based on YOLOv3 model [65.35132992156942]
A novel ships dataset is proposed consisting of more than 56k images of marine vessels collected by means of web-scraping. A YOLOv3 single-stage detector based on Keras API is built on top of this dataset.
arXiv Detail & Related papers (2020-02-10T15:25:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.