Learning Object Depth from Camera Motion and Video Object Segmentation
- URL: http://arxiv.org/abs/2007.05676v3
- Date: Fri, 18 Dec 2020 17:43:07 GMT
- Title: Learning Object Depth from Camera Motion and Video Object Segmentation
- Authors: Brent A. Griffin and Jason J. Corso
- Abstract summary: This paper addresses the problem of learning to estimate the depth of segmented objects given some measurement of camera motion.
We create artificial object segmentations that are scaled for changes in distance between the camera and object, and our network learns to estimate object depth even with segmentation errors.
We demonstrate our approach across domains using a robot camera to locate objects from the YCB dataset and a vehicle camera to locate obstacles while driving.
- Score: 43.81711115175958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video object segmentation, i.e., the separation of a target object from
background in video, has made significant progress on real and challenging
videos in recent years. To leverage this progress in 3D applications, this
paper addresses the problem of learning to estimate the depth of segmented
objects given some measurement of camera motion (e.g., from robot kinematics or
vehicle odometry). We achieve this by, first, introducing a diverse, extensible
dataset and, second, designing a novel deep network that estimates the depth of
objects using only segmentation masks and uncalibrated camera movement. Our
data-generation framework creates artificial object segmentations that are
scaled for changes in distance between the camera and object, and our network
learns to estimate object depth even with segmentation errors. We demonstrate
our approach across domains using a robot camera to locate objects from the YCB
dataset and a vehicle camera to locate obstacles while driving.
Related papers
- DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker [4.65004369765875]
Accurately distinguishing each object is a fundamental goal of Multi-object tracking (MOT) algorithms.
In this paper, we propose textitDepthMOT, which achieves: (i) detecting and estimating scene depth map textitend-to-end, (ii) compensating the irregular camera motion by camera pose estimation.
arXiv Detail & Related papers (2024-04-08T13:39:12Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - Semi-Weakly Supervised Object Kinematic Motion Prediction [56.282759127180306]
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters.
We propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters.
The network predictions yield a large scale of 3D objects with pseudo labeled mobility information.
arXiv Detail & Related papers (2023-03-31T02:37:36Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z) - Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot
Interaction [23.572104156617844]
We introduce a novel robotic system for improving unseen object instance segmentation in the real world by leveraging long-term robot interaction with objects.
Our system defers the decision on segmenting objects after a sequence of robot pushing actions.
We demonstrate the usefulness of our system by fine-tuning segmentation networks trained on synthetic data with real-world data collected by our system.
arXiv Detail & Related papers (2023-02-07T23:11:29Z) - 3D Object Aided Self-Supervised Monocular Depth Estimation [5.579605877061333]
We propose a new method to address dynamic object movements through monocular 3D object detection.
Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose.
In this way, the depth of every pixel can be learned via a meaningful geometry model.
arXiv Detail & Related papers (2022-12-04T08:52:33Z) - The Right Spin: Learning Object Motion from Rotation-Compensated Flow
Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision.
One approach to the problem is to teach a deep network to model all of these effects.
We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - 3D Object Segmentation for Shelf Bin Picking by Humanoid with Deep
Learning and Occupancy Voxel Grid Map [27.312696750923926]
We develop a method to segment target objects in 3D using multiple camera angles and voxel grid map.
We evaluate the method with the picking task experiment for target objects in narrow shelf bins.
arXiv Detail & Related papers (2020-01-15T16:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.