Active Human Pose Estimation via an Autonomous UAV Agent
- URL: http://arxiv.org/abs/2407.01811v1
- Date: Mon, 1 Jul 2024 21:20:52 GMT
- Title: Active Human Pose Estimation via an Autonomous UAV Agent
- Authors: Jingxi Chen, Botao He, Chahat Deep Singh, Cornelia Fermuller, Yiannis Aloimonos,
- Abstract summary: This paper focuses on the task of human pose estimation from videos capturing a person's activity.
To address this, relocating the camera to a new vantage point is necessary to clarify the view.
Our proposed solution comprises three main components: a NeRF-based Drone-View Data Generation Framework, an On-Drone Network for Camera View Error Estimation, and a Combined Planner.
- Score: 13.188563931419056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the core activities of an active observer involves moving to secure a "better" view of the scene, where the definition of "better" is task-dependent. This paper focuses on the task of human pose estimation from videos capturing a person's activity. Self-occlusions within the scene can complicate or even prevent accurate human pose estimation. To address this, relocating the camera to a new vantage point is necessary to clarify the view, thereby improving 2D human pose estimation. This paper formalizes the process of achieving an improved viewpoint. Our proposed solution to this challenge comprises three main components: a NeRF-based Drone-View Data Generation Framework, an On-Drone Network for Camera View Error Estimation, and a Combined Planner for devising a feasible motion plan to reposition the camera based on the predicted errors for camera views. The Data Generation Framework utilizes NeRF-based methods to generate a comprehensive dataset of human poses and activities, enhancing the drone's adaptability in various scenarios. The Camera View Error Estimation Network is designed to evaluate the current human pose and identify the most promising next viewing angles for the drone, ensuring a reliable and precise pose estimation from those angles. Finally, the combined planner incorporates these angles while considering the drone's physical and environmental limitations, employing efficient algorithms to navigate safe and effective flight paths. This system represents a significant advancement in active 2D human pose estimation for an autonomous UAV agent, offering substantial potential for applications in aerial cinematography by improving the performance of autonomous human pose estimation and maintaining the operational safety and efficiency of UAVs.
Related papers
- Self-Supervised Monocular Visual Drone Model Identification through Improved Occlusion Handling [17.368574409020475]
Ego-motion estimation is vital for drones when flying in GPS-denied environments.
We propose a self-supervised learning scheme to train a neural-network-based drone model using only onboard monocular video and flight controller data.
We demonstrate the value of the neural drone model by integrating it into a traditional filter-based VIO system.
arXiv Detail & Related papers (2025-04-30T14:38:01Z) - Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation [67.9563319914377]
This paper investigates the usefulness of rear cameras in the head-mounted device (HMD) design for full-body tracking.
We propose a new transformer-based method that refines 2D joint heatmap estimation with multi-view information and heatmap uncertainty.
Our experiments show that the new camera configurations with back views provide superior support for 3D pose tracking.
arXiv Detail & Related papers (2025-03-14T17:59:54Z) - DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation.
DeProPose is a flexible method that simplifies the network architecture to reduce training complexity.
We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.
We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.
We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras [17.010390107028275]
We propose a precise pose estimation method for dashcam images, leveraging the inherent camera motion prior.
Our method is 22% better than the baseline for pose estimation in AUC5textdegree, and it can estimate poses for 19% more images with less reprojection error.
arXiv Detail & Related papers (2024-09-27T11:59:00Z) - Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation [3.442372522693843]
We present a novel approach for robust 3D human pose estimation in the context of human-robot collaboration.
Our approach outperforms state-of-the-art multi-view human pose estimation techniques.
arXiv Detail & Related papers (2024-08-28T14:10:57Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based
Motion Refinement [65.08165593201437]
We explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion.
This task presents significant challenges due to the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion.
We propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction.
arXiv Detail & Related papers (2023-11-28T07:13:47Z) - 3D Pose Nowcasting: Forecast the Future to Improve the Present [65.65178700528747]
We propose a novel vision-based system leveraging depth data to accurately establish the 3D locations of skeleton joints.
We introduce the concept of Pose Nowcasting, denoting the capability of the proposed system to enhance its current pose estimation accuracy.
The experimental evaluation is conducted on two different datasets, providing accurate and real-time performance.
arXiv Detail & Related papers (2023-08-24T16:40:47Z) - AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z) - Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based
Object Re-Identification [38.19907319079833]
We propose a multitask learning approach, which employs a new multiscale architecture without convolution, Pyramid Vision Transformer (PVT) as the backbone for UAV-based object ReID.
By uncertainty modeling of intraclass variations, our proposed model can be jointly optimized using both uncertainty-aware object ID and camera ID information.
arXiv Detail & Related papers (2022-09-19T00:27:07Z) - A Review on Viewpoints and Path-planning for UAV-based 3D Reconstruction [3.0479044961661708]
3D reconstruction using the data captured by UAVs is also attracting attention in research and industry.
This review paper investigates a wide range of model-free and model-based algorithms for viewpoint and path planning for 3D reconstruction of large-scale objects.
arXiv Detail & Related papers (2022-05-07T20:29:39Z) - CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the
Wild [31.334715988245748]
We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data.
In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras.
Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
arXiv Detail & Related papers (2020-11-30T10:42:27Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.