Impact of Pseudo Depth on Open World Object Segmentation with Minimal
User Guidance
- URL: http://arxiv.org/abs/2304.05716v1
- Date: Wed, 12 Apr 2023 09:18:38 GMT
- Title: Impact of Pseudo Depth on Open World Object Segmentation with Minimal
User Guidance
- Authors: Robin Sch\"on, Katja Ludwig, Rainer Lienhart
- Abstract summary: Pseudo depth maps are depth map predicitions which are used as ground truth during training.
In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training.
- Score: 18.176606453818557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pseudo depth maps are depth map predicitions which are used as ground truth
during training. In this paper we leverage pseudo depth maps in order to
segment objects of classes that have never been seen during training. This
renders our object segmentation task an open world task. The pseudo depth maps
are generated using pretrained networks, which have either been trained with
the full intention to generalize to downstream tasks (LeRes and MiDaS), or
which have been trained in an unsupervised fashion on video sequences
(MonodepthV2). In order to tell our network which object to segment, we provide
the network with a single click on the object's surface on the pseudo depth map
of the image as input. We test our approach on two different scenarios: One
without the RGB image and one where the RGB image is part of the input. Our
results demonstrate a considerably better generalization performance from seen
to unseen object types when depth is used. On the Semantic Boundaries Dataset
we achieve an improvement from $61.57$ to $69.79$ IoU score on unseen classes,
when only using half of the training classes during training and performing the
segmentation on depth maps only.
Related papers
- Background Prompting for Improved Object Depth [70.25467510077706]
Estimating the depth of objects from a single image is a valuable task for many vision, robotics, and graphics applications.
We propose a simple yet effective Background Prompting strategy that adapts the input object image with a learned background.
Results on multiple synthetic and real datasets demonstrate consistent improvements in real object depths for a variety of existing depth networks.
arXiv Detail & Related papers (2023-06-08T17:59:59Z) - Source-free Depth for Object Pop-out [113.24407776545652]
Modern learning-based methods offer promising depth maps by inference in the wild.
We adapt such depth inference models for object segmentation using the objects' "pop-out" prior in 3D.
Our experiments on eight datasets consistently demonstrate the benefit of our method in terms of both performance and generalizability.
arXiv Detail & Related papers (2022-12-10T21:57:11Z) - Depth Is All You Need for Monocular 3D Detection [29.403235118234747]
We propose to align depth representation with the target domain in unsupervised fashions.
Our methods leverage commonly available LiDAR or RGB videos during training time to fine-tune the depth representation, which leads to improved 3D detectors.
arXiv Detail & Related papers (2022-10-05T18:12:30Z) - Learning to segment from object sizes [0.0]
We propose an algorithm for training a deep segmentation network from a dataset of a few pixel-wise annotated images and many images with known object sizes.
The algorithm minimizes a discrete (non-differentiable) loss function defined over the object sizes by sampling the gradient and then using the standard back-propagation algorithm.
arXiv Detail & Related papers (2022-07-01T09:34:44Z) - Least Square Estimation Network for Depth Completion [11.840223815711004]
In this paper, we propose an effective image representation method for depth completion tasks.
The input of our system is a monocular camera frame and the synchronous sparse depth map.
Experiments show that our results beat the state-of-the-art on NYU-Depth-V2 dataset both in accuracy and runtime.
arXiv Detail & Related papers (2022-03-07T11:52:57Z) - Learning To Segment Dominant Object Motion From Watching Videos [72.57852930273256]
We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps.
Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion.
This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
arXiv Detail & Related papers (2021-11-28T14:51:00Z) - DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes [68.38952377590499]
We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments.
Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
arXiv Detail & Related papers (2021-08-12T09:12:39Z) - SGTBN: Generating Dense Depth Maps from Single-Line LiDAR [13.58227120045849]
Current depth completion methods use extremely expensive 64-line LiDAR to obtain sparse depth maps.
Compared with the 64-line LiDAR, the single-line LiDAR is much less expensive and much more robust.
A single-line depth completion dataset is proposed based on the existing 64-line depth completion dataset.
arXiv Detail & Related papers (2021-06-24T13:08:35Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.