BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
- URL: http://arxiv.org/abs/2212.07401v3
- Date: Fri, 2 Jun 2023 05:03:24 GMT
- Title: BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
- Authors: Jennifer J. Sun, Lili Karashchuk, Amil Dravid, Serim Ryou, Sonia
Fereidooni, John Tuthill, Aggelos Katsaggelos, Bingni W. Brunton, Georgia
Gkioxari, Ann Kennedy, Yisong Yue, Pietro Perona
- Abstract summary: We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents.
Our method, BKinD-3D, uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct differences across multiple views.
- Score: 38.16427363571254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantifying motion in 3D is important for studying the behavior of humans and
other animals, but manual pose annotations are expensive and time-consuming to
obtain. Self-supervised keypoint discovery is a promising strategy for
estimating 3D poses without annotations. However, current keypoint discovery
approaches commonly process single 2D views and do not operate in the 3D space.
We propose a new method to perform self-supervised keypoint discovery in 3D
from multi-view videos of behaving agents, without any keypoint or bounding box
supervision in 2D or 3D. Our method, BKinD-3D, uses an encoder-decoder
architecture with a 3D volumetric heatmap, trained to reconstruct
spatiotemporal differences across multiple views, in addition to joint length
constraints on a learned 3D skeleton of the subject. In this way, we discover
keypoints without requiring manual supervision in videos of humans and rats,
demonstrating the potential of 3D keypoint discovery for studying behavior.
Related papers
- Weakly Supervised Monocular 3D Detection with a Single-View Image [58.57978772009438]
Monocular 3D detection aims for precise 3D object localization from a single-view image.
We propose SKD-WM3D, a weakly supervised monocular 3D detection framework.
We show that SKD-WM3D surpasses the state-of-the-art clearly and is even on par with many fully supervised methods.
arXiv Detail & Related papers (2024-02-29T13:26:47Z) - 3D Implicit Transporter for Temporally Consistent Keypoint Discovery [45.152790256675964]
Keypoint-based representation has proven advantageous in various visual and robotic tasks.
The Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information.
We propose the first 3D version of the Transporter, which leverages hybrid 3D representation, cross attention, and implicit reconstruction.
arXiv Detail & Related papers (2023-09-10T17:59:48Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - Unsupervised 3D Keypoint Discovery with Multi-View Geometry [104.76006413355485]
We propose an algorithm that learns to discover 3D keypoints on human bodies from multiple-view images without supervision or labels.
Our approach discovers more interpretable and accurate 3D keypoints compared to other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2022-11-23T10:25:12Z) - TANDEM3D: Active Tactile Exploration for 3D Object Recognition [16.548376556543015]
We propose TANDEM3D, a method that applies a co-training framework for 3D object recognition with tactile signals.
TANDEM3D is based on a novel encoder that builds 3D object representation from contact positions and normals using PointNet++.
Our method is trained entirely in simulation and validated with real-world experiments.
arXiv Detail & Related papers (2022-09-19T05:54:26Z) - Gait Recognition in the Wild with Dense 3D Representations and A
Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes.
This paper aims to explore dense 3D representations for gait recognition in the wild.
We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z) - RTM3D: Real-time Monocular 3D Detection from Object Keypoints for
Autonomous Driving [26.216609821525676]
Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component.
Our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space.
Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2020-01-10T08:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.