Related papers: Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences

Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences

URL: http://arxiv.org/abs/2509.20906v1
Date: Thu, 25 Sep 2025 08:46:37 GMT
Title: Finding 3D Positions of Distant Objects from Noisy Camera Movement and Semantic Segmentation Sequences
Authors: Julius Pesonen, Arno Solin, Eija Honkavaara,
Abstract summary: 3D object localisation based on a sequence of camera measurements is essential for safety-critical surveillance tasks, such as drone-based wildfire monitoring.<n>In this paper, we show that the task can be solved using particle filters for both single and multiple target scenarios.<n>The method was studied using a 3D simulation and a drone-based image segmentation sequence with global navigation satellite system (GNSS)-based camera pose estimates.
Score: 15.431008066373089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D object localisation based on a sequence of camera measurements is essential for safety-critical surveillance tasks, such as drone-based wildfire monitoring. Localisation of objects detected with a camera can typically be solved with dense depth estimation or 3D scene reconstruction. However, in the context of distant objects or tasks limited by the amount of available computational resources, neither solution is feasible. In this paper, we show that the task can be solved using particle filters for both single and multiple target scenarios. The method was studied using a 3D simulation and a drone-based image segmentation sequence with global navigation satellite system (GNSS)-based camera pose estimates. The results showed that a particle filter can be used to solve practical localisation tasks based on camera poses and image segments in these situations where other solutions fail. The particle filter is independent of the detection method, making it flexible for new tasks. The study also demonstrates that drone-based wildfire monitoring can be conducted using the proposed method paired with a pre-existing image segmentation model.

Related papers

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking [67.34803048690428]
We propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem. DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden. It is flexible and practical that can be plugged into most camera-based 3D object detectors.
arXiv Detail & Related papers (2023-03-29T12:33:55Z)
A Fast Location Algorithm for Very Sparse Point Clouds Based on Object Detection [0.0]
We propose an algorithm which can quickly locate the target object through image object detection in the circumstances of having very sparse feature points. We conduct the experiment in a manually designed scene by holding a smartphone and the results represent high positioning speed and accuracy of our method.
arXiv Detail & Related papers (2021-10-21T05:17:48Z)
Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task. We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z)
EagerMOT: 3D Multi-Object Tracking via Sensor Fusion [68.8204255655161]
Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time. Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal. We propose EagerMOT, a simple tracking formulation that integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics.
arXiv Detail & Related papers (2021-04-29T22:30:29Z)
PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space. We propose a model that unifies these two tasks in the same metric space. Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z)
Continuous close-range 3D object pose estimation [1.4502611532302039]
Vision-based 3D pose estimation is a necessity to accurately handle objects that might not be placed at fixed positions. In this paper, we present a 3D pose estimation method based on a gradient-ascend particle filter. Thereby, we can apply this method online during task execution to save valuable cycle time.
arXiv Detail & Related papers (2020-10-02T07:48:17Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
Integration of the 3D Environment for UAV Onboard Visual Object Tracking [7.652259812856325]
Single visual object tracking from an unmanned aerial vehicle poses fundamental challenges. We introduce a pipeline that combines a model-free visual object tracker, a sparse 3D reconstruction, and a state estimator. By representing the position of the target in 3D space rather than in image space, we stabilize the tracking during ego-motion.
arXiv Detail & Related papers (2020-08-06T18:37:29Z)
Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance [36.73303869405764]
Self-supervised monocular depth estimation presents a powerful method to obtain 3D scene information from single camera images. We present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects.
arXiv Detail & Related papers (2020-07-14T09:47:27Z)
A Bayesian Filter for Multi-view 3D Multi-object Tracking with Occlusion Handling [2.824395407508717]
The proposed algorithm has a linear complexity in the total number of detections across the cameras. It operates in the 3D world frame, and provides 3D trajectory estimates of the objects. The proposed algorithm is evaluated on the latest WILDTRACKS dataset, and demonstrated to work in very crowded scenes.
arXiv Detail & Related papers (2020-01-13T09:34:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.