Temporally-consistent 3D Reconstruction of Birds
- URL: http://arxiv.org/abs/2408.13629v1
- Date: Sat, 24 Aug 2024 17:12:36 GMT
- Title: Temporally-consistent 3D Reconstruction of Birds
- Authors: Johannes Hägerlind, Jonas Hentati-Sundberg, Bastian Wandt,
- Abstract summary: We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre.
We provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously.
- Score: 5.787285686300833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper deals with 3D reconstruction of seabirds which recently came into focus of environmental scientists as valuable bio-indicators for environmental change. Such 3D information is beneficial for analyzing the bird's behavior and physiological shape, for example by tracking motion, shape, and appearance changes. From a computer vision perspective birds are especially challenging due to their rapid and oftentimes non-rigid motions. We propose an approach to reconstruct the 3D pose and shape from monocular videos of a specific breed of seabird - the common murre. Our approach comprises a full pipeline of detection, tracking, segmentation, and temporally consistent 3D reconstruction. Additionally, we propose a temporal loss that extends current single-image 3D bird pose estimators to the temporal domain. Moreover, we provide a real-world dataset of 10000 frames of video observations on average capture nine birds simultaneously, comprising a large variety of motions and interactions, including a smaller test set with bird-specific keypoint labels. Using our temporal optimization, we achieve state-of-the-art performance for the challenging sequences in our dataset.
Related papers
- MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Virtual Pets: Animatable Animal Generation in 3D Scenes [84.0990909455833]
We introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.
We leverage monocular internet videos and extract deformable NeRF representations for the foreground and static NeRF representations for the background.
We develop a reconstruction strategy, encompassing species-level shared template learning and per-video fine-tuning.
arXiv Detail & Related papers (2023-12-21T18:59:30Z) - TriHuman : A Real-time and Controllable Tri-plane Representation for
Detailed Human Geometry and Appearance Synthesis [76.73338151115253]
TriHuman is a novel human-tailored, deformable, and efficient tri-plane representation.
We non-rigidly warp global ray samples into our undeformed tri-plane texture space.
We show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes.
arXiv Detail & Related papers (2023-12-08T16:40:38Z) - 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking [14.52333427647304]
We present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple camera views.
For identity matching, we first dynamically match 2D detections to global identities in the first frame, then use a 2D tracker to maintain IDs across views in subsequent frames.
We show that 3D-MuPPET also works in outdoors without additional annotations from natural environments.
arXiv Detail & Related papers (2023-08-29T14:02:27Z) - Multi-view Tracking, Re-ID, and Social Network Analysis of a Flock of
Visually Similar Birds in an Outdoor Aviary [32.19504891200443]
We introduce a system for studying the behavioral dynamics of a group of songbirds as they move throughout a 3D aviary.
We study the complexities that arise when tracking a group of closely interacting animals in three dimensions and introduce a novel dataset for evaluating multi-view trackers.
arXiv Detail & Related papers (2022-12-01T04:23:18Z) - Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable
Categories [80.30216777363057]
We introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets.
At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views.
Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
arXiv Detail & Related papers (2022-11-07T22:42:42Z) - Learning-based Monocular 3D Reconstruction of Birds: A Contemporary
Survey [6.555250822345809]
In nature, the collective behavior of animals is dominated by the interactions between individuals of the same species.
Recent advances in 3D vision have led to a number of impressive works on the 3D shape and pose estimation.
This work is the first attempt to provide an overview of recent advances in 3D bird reconstruction based on monocular vision.
arXiv Detail & Related papers (2022-07-10T18:13:25Z) - Spatio-temporal Self-Supervised Representation Learning for 3D Point
Clouds [96.9027094562957]
We introduce a-temporal representation learning framework, capable of learning from unlabeled tasks.
Inspired by how infants learn from visual data in the wild, we explore rich cues derived from the 3D data.
STRL takes two temporally-related frames from a 3D point cloud sequence as the input, transforms it with the spatial data augmentation, and learns the invariant representation self-supervisedly.
arXiv Detail & Related papers (2021-09-01T04:17:11Z) - AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs
in the Wild [51.35013619649463]
We present an extensive dataset of free-running cheetahs in the wild, called AcinoSet.
The dataset contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames.
The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided.
arXiv Detail & Related papers (2021-03-24T15:54:11Z) - 3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a
Single View [35.61330221535231]
We introduce a model and multi-view optimization approach to capture the unique shape and pose space displayed by live birds.
We then introduce a pipeline and experiments for keypoint, mask, pose, and shape regression that recovers accurate avian postures from single views.
We provide extensive multi-view keypoint and mask annotations collected from a group of 15 social birds housed together in an outdoor aviary.
arXiv Detail & Related papers (2020-08-13T23:29:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.