Understanding 3D Object Articulation in Internet Videos
- URL: http://arxiv.org/abs/2203.16531v1
- Date: Wed, 30 Mar 2022 17:59:46 GMT
- Title: Understanding 3D Object Articulation in Internet Videos
- Authors: Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David F. Fouhey
- Abstract summary: We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos.
While seemingly easy for humans, this problem poses many challenges for computers.
We show that this system can be trained on a combination of videos and 3D scan datasets.
- Score: 16.457168338946566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to investigate detecting and characterizing the 3D planar
articulation of objects from ordinary videos. While seemingly easy for humans,
this problem poses many challenges for computers. We propose to approach this
problem by combining a top-down detection system that finds planes that can be
articulated along with an optimization approach that solves for a 3D plane that
can explain a sequence of observed articulations. We show that this system can
be trained on a combination of videos and 3D scan datasets. When tested on a
dataset of challenging Internet videos and the Charades dataset, our approach
obtains strong performance. Project site:
https://jasonqsy.github.io/Articulation3D
Related papers
- MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps [51.44887282336391]
Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection.
Previous method relies on NeRF for geometry reasoning.
We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
arXiv Detail & Related papers (2024-10-28T21:58:41Z) - EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos [7.616167860385134]
It has long been challenging to recover the underlying dynamic 3D scene representations from a monocular RGB video.
We introduce a new framework, called OSN, to learn all plausible 3D scene configurations that match the input video.
Our method demonstrates a clear advantage in learning fine-grained 3D scene geometry.
arXiv Detail & Related papers (2024-07-08T05:03:46Z) - CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding [23.885017062031217]
3D visual grounding is the ability to localize objects in 3D scenes conditioned by utterances.
Most existing methods devote the referring head to localize the referred object directly, causing failure in complex scenarios.
We formulate the 3D visual grounding problem as a sequence-to-sequence Seq2Seq task by first predicting a chain of anchors and then the final target.
arXiv Detail & Related papers (2023-10-10T00:07:25Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Articulated 3D Human-Object Interactions from RGB Videos: An Empirical
Analysis of Approaches and Challenges [19.21834600205309]
We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video.
We use five families of methods for this task: 3D plane estimation, 3D cuboid estimation, CAD model fitting, implicit field fitting, and free-form mesh fitting.
Our experiments show that all methods struggle to obtain high accuracy results even when provided ground truth information.
arXiv Detail & Related papers (2022-09-12T21:03:25Z) - Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds.
We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Interactive Annotation of 3D Object Geometry using 2D Scribbles [84.51514043814066]
In this paper, we propose an interactive framework for annotating 3D object geometry from point cloud data and RGB imagery.
Our framework targets naive users without artistic or graphics expertise.
arXiv Detail & Related papers (2020-08-24T21:51:29Z) - Weakly Supervised 3D Object Detection from Point Clouds [27.70180601788613]
3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios.
We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
arXiv Detail & Related papers (2020-07-28T03:30:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.