Revealing Occlusions with 4D Neural Fields
- URL: http://arxiv.org/abs/2204.10916v1
- Date: Fri, 22 Apr 2022 20:14:42 GMT
- Title: Revealing Occlusions with 4D Neural Fields
- Authors: Basile Van Hoorick, Purva Tendulka, Didac Suris, Dennis Park, Simon
Stent, Carl Vondrick
- Abstract summary: For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence.
We introduce a framework for learning to estimate 4D visual representations from monocular-Dtemporal.
- Score: 19.71277637485384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For computer vision systems to operate in dynamic situations, they need to be
able to represent and reason about object permanence. We introduce a framework
for learning to estimate 4D visual representations from monocular RGB-D, which
is able to persist objects, even once they become obstructed by occlusions.
Unlike traditional video representations, we encode point clouds into a
continuous representation, which permits the model to attend across the
spatiotemporal context to resolve occlusions. On two large video datasets that
we release along with this paper, our experiments show that the representation
is able to successfully reveal occlusions for several tasks, without any
architectural changes. Visualizations show that the attention mechanism
automatically learns to follow occluded objects. Since our approach can be
trained end-to-end and is easily adaptable, we believe it will be useful for
handling occlusions in many video understanding tasks. Data, code, and models
are available at https://occlusions.cs.columbia.edu/.
Related papers
- Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases [69.46487306858789]
Conditional Autoregressive Slot Attention (CA-SA) is a framework that enhances the temporal consistency of extracted object-centric representations in video-centric vision tasks.
We present qualitative and quantitative results showing that our proposed method outperforms the considered baselines on downstream tasks.
arXiv Detail & Related papers (2024-10-21T07:44:44Z) - One-shot Video Imitation via Parameterized Symbolic Abstraction Graphs [8.872100864022675]
We propose to interpret video demonstrations through Symbolicized Abstraction Graphs (PSAG)
We further ground denote geometric constraints through simulation to estimate non-geometric, visually imperceptible attributes.
Our approach has been validated across a range of tasks, such as Cutting Avocado, Cutting Vegetable, Pouring Liquid, Rolling Dough, and Slicing Pizza.
arXiv Detail & Related papers (2024-08-22T18:26:47Z) - Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - Linking vision and motion for self-supervised object-centric perception [16.821130222597155]
Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features.
Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization.
We adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs.
arXiv Detail & Related papers (2023-07-14T04:21:05Z) - Factored Neural Representation for Scene Understanding [39.66967677639173]
We introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations.
We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable.
arXiv Detail & Related papers (2023-04-21T13:40:30Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - Blocks World Revisited: The Effect of Self-Occlusion on Classification
by Convolutional Neural Networks [17.58979205709865]
TEOS (The Effect of Self-Occlusion) is a 3D blocks world dataset that focuses on the geometric shape of 3D objects.
In the real-world, self-occlusion of 3D objects still presents significant challenges for deep learning approaches.
We provide 738 uniformly sampled views of each object, their mask, object and camera position, orientation, amount of self-occlusion, as well as the CAD model of each object.
arXiv Detail & Related papers (2021-02-25T15:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.