Related papers: Fusing Monocular RGB Images with AIS Data to Create a 6D Pose Estimation Dataset for Marine Vessels

Fusing Monocular RGB Images with AIS Data to Create a 6D Pose Estimation Dataset for Marine Vessels

URL: http://arxiv.org/abs/2508.14767v1
Date: Wed, 20 Aug 2025 15:16:33 GMT
Title: Fusing Monocular RGB Images with AIS Data to Create a 6D Pose Estimation Dataset for Marine Vessels
Authors: Fabian Holst, Emre Gülsoylu, Simone Frintrop,
Abstract summary: The paper presents a novel technique for creating a 6D pose estimation dataset for marine vessels by fusing monocular RGB images with AIS data.<n>We show that our approach allows the creation of a 6D pose estimation dataset without needing manual annotation.<n>We introduce the Boats on Nordelbe Kehrwieder (BONK-pose), a publicly available dataset comprising 3753 images with 3D bounding box annotations for pose estimation.
Score: 2.6654260060295134
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The paper presents a novel technique for creating a 6D pose estimation dataset for marine vessels by fusing monocular RGB images with Automatic Identification System (AIS) data. The proposed technique addresses the limitations of relying purely on AIS for location information, caused by issues like equipment reliability, data manipulation, and transmission delays. By combining vessel detections from monocular RGB images, obtained using an object detection network (YOLOX-X), with AIS messages, the technique generates 3D bounding boxes that represent the vessels' 6D poses, i.e. spatial and rotational dimensions. The paper evaluates different object detection models to locate vessels in image space. We also compare two transformation methods (homography and Perspective-n-Point) for aligning AIS data with image coordinates. The results of our work demonstrate that the Perspective-n-Point (PnP) method achieves a significantly lower projection error compared to homography-based approaches used before, and the YOLOX-X model achieves a mean Average Precision (mAP) of 0.80 at an Intersection over Union (IoU) threshold of 0.5 for relevant vessel classes. We show indication that our approach allows the creation of a 6D pose estimation dataset without needing manual annotation. Additionally, we introduce the Boats on Nordelbe Kehrwieder (BONK-pose), a publicly available dataset comprising 3753 images with 3D bounding box annotations for pose estimation, created by our data fusion approach. This dataset can be used for training and evaluating 6D pose estimation networks. In addition we introduce a set of 1000 images with 2D bounding box annotations for ship detection from the same scene.

Related papers

Any6D: Model-free 6D Pose Estimation of Novel Objects [76.30057578269668]
We introduce Any6D, a model-free framework for 6D object pose estimation.<n>It requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes.<n>We evaluate our method on five challenging datasets.
arXiv Detail & Related papers (2025-03-24T13:46:21Z)
A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation [49.1574468325115]
Esting 6D object poses from RGB images is challenging because the lack of depth information requires inferring a three dimensional structure from 2D projections.<n>Traditional methods often rely on deep learning with grid based data structures but struggle to capture complex dependencies among extracted features.<n>We introduce a graph based representation derived directly from images, where temporal features of each pixel serve as nodes, and relationships between them are defined through node connectivity and spatial interactions.
arXiv Detail & Related papers (2024-12-31T18:47:54Z)
DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects [59.51874686414509]
Existing approaches typically predict 3D translation utilizing the ground-truth object bounding box and approximate 3D rotation with a large number of discrete hypotheses.<n>We present a Deep Voxel Matching Network (DVMNet++) that computes the relative object pose in a single pass.<n>Our approach delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
Image and AIS Data Fusion Technique for Maritime Computer Vision Applications [1.482087972733629]
We develop a technique that fuses Automatic Identification System (AIS) data with vessels detected in images to create datasets. Our approach associates detected ships to their corresponding AIS messages by estimating distance and azimuth. This technique is useful for creating datasets for waterway traffic management, encounter detection, and surveillance.
arXiv Detail & Related papers (2023-12-07T20:54:49Z)
RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information. A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z)
Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information. We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z)
PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation [6.860183454947986]
We present a transformer-based approach that takes an RGB image as input and predicts a 6D pose for each object in the image. Besides the image, our network does not require any additional information such as depth maps or 3D object models. We achieve state-of-the-art results for RGB-only approaches on the challenging YCB-V dataset.
arXiv Detail & Related papers (2022-11-25T14:07:14Z)
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z)
DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data. The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes. We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z)
L6DNet: Light 6 DoF Network for Robust and Precise Object Pose Estimation with Small Datasets [0.0]
We propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image. We adopt a hybrid pipeline in two stages: data-driven and geometric. Our approach is more robust and accurate than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-03T17:41:29Z)
Simultaneous 3D Object Segmentation and 6-DOF Pose Estimation [0.7252027234425334]
We propose a method for simultaneous 3D object segmentation and 6-DOF pose estimation in pure 3D point clouds scenes.<n>The key component of our method is a multi-task CNN architecture that can simultaneously predict the 3D object segmentation and 6-DOF pose estimation in pure 3D point clouds.<n>For experimental evaluation, we generate expanded training data for two state-of-the-arts 3D object datasets citePLciteTLINEMOD by using Augmented Reality (AR)
arXiv Detail & Related papers (2019-12-27T13:48:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.