Related papers: DORec: Decomposed Object Reconstruction Utilizing 2D Self-Supervised Features

DORec: Decomposed Object Reconstruction Utilizing 2D Self-Supervised Features

URL: http://arxiv.org/abs/2310.11092v2
Date: Thu, 19 Oct 2023 14:16:49 GMT
Title: DORec: Decomposed Object Reconstruction Utilizing 2D Self-Supervised Features
Authors: Jun Wu, Sicheng Li, Sihui Ji, Yue Wang, Rong Xiong, and Yiyi Liao
Abstract summary: We propose a Decomposed Object Reconstruction network based on neural implicit representations. Our key idea is to transfer 2D self-supervised features into masks of two levels of granularity to supervise the decomposition. Experimental results show the superiority of DORec in segmenting and reconstructing the foreground object on various datasets.
Score: 28.446955045371737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decomposing a target object from a complex background while reconstructing is challenging. Most approaches acquire the perception for object instances through the use of manual labels, but the annotation procedure is costly. The recent advancements in 2D self-supervised learning have brought new prospects to object-aware representation, yet it remains unclear how to leverage such noisy 2D features for clean decomposition. In this paper, we propose a Decomposed Object Reconstruction (DORec) network based on neural implicit representations. Our key idea is to transfer 2D self-supervised features into masks of two levels of granularity to supervise the decomposition, including a binary mask to indicate the foreground regions and a K-cluster mask to indicate the semantically similar regions. These two masks are complementary to each other and lead to robust decomposition. Experimental results show the superiority of DORec in segmenting and reconstructing the foreground object on various datasets.

Related papers

Object Learning and Robust 3D Reconstruction [7.092348056331202]
We discuss architectural designs and training methods for a neural network to dissect an image into objects of interest without supervision. FlowCapsules uses motion as a cue for the objects of interest in 2D scenarios. We leverage the geometric consistency of scenes in 3D to detect the inconsistent dynamic objects.
arXiv Detail & Related papers (2025-04-22T21:48:31Z)
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision [7.511342491529451]
We study the hard problem of 3D object segmentation in complex point clouds without requiring human labels of 3D scenes for supervision. By relying on the similarity of pretrained 2D features or external signals such as motion to group 3D points as objects, existing unsupervised methods are usually limited to identifying simple objects like cars or their segmented objects are often inferior due to the lack of objectness in pretrained features.
arXiv Detail & Related papers (2025-04-16T04:13:53Z)
PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes. Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects. Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z)
DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery [46.711276257688326]
NeRFs have become a powerful tool for modeling 3D scenes from multiple images. Previous approaches to 3D segmentation of NeRFs either require user interaction to isolate a single object, or they rely on 2D semantic masks with a limited number of classes for supervision. We propose a method that is robust to inconsistent segmentations and successfully decomposes the scene into a set of objects of any class.
arXiv Detail & Related papers (2024-08-19T12:07:24Z)
MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders [93.87585467898252]
We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders. MonoMAE consists of two novel designs. The first is depth-aware masking that selectively masks certain parts of non-occluded object queries. The second is lightweight query completion that works with the depth-aware masking to learn to reconstruct and complete the masked object queries.
arXiv Detail & Related papers (2024-05-13T12:32:45Z)
SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR. SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds. We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z)
Iterative Superquadric Recomposition of 3D Objects from Multiple Views [77.53142165205283]
We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views. Our framework iteratively adds new superquadrics wherever the reconstruction error is high. It provides consistently more accurate 3D reconstructions, even from images in the wild.
arXiv Detail & Related papers (2023-09-05T10:21:37Z)
AutoRecon: Automated 3D Object Discovery and Reconstruction [41.60050228813979]
We propose a novel framework named AutoRecon for the automated discovery and reconstruction of an object from multi-view images. We demonstrate that foreground objects can be robustly located and segmented from SfM point clouds by leveraging self-supervised 2D vision transformer features. Experiments on the DTU, BlendedMVS and CO3D-V2 datasets demonstrate the effectiveness and robustness of AutoRecon.
arXiv Detail & Related papers (2023-05-15T17:16:46Z)
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation [2.861848675707602]
We present a new single-stage architecture called CASAPose. It determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects.
arXiv Detail & Related papers (2022-10-11T10:20:01Z)
OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds [4.709764624933227]
We propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass. We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation.
arXiv Detail & Related papers (2022-10-10T07:01:08Z)
Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances. We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z)
AutoSweep: Recovering 3D Editable Objectsfrom a Single Photograph [54.701098964773756]
We aim to recover 3D objects with semantic parts and can be directly edited. Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders. Our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.
arXiv Detail & Related papers (2020-05-27T12:16:24Z)
SDOD:Real-time Segmenting and Detecting 3D Object by Depth [5.97602869680438]
This paper proposes a real-time framework that segmenting and detecting 3D objects by depth. We discretize the objects' depth into depth categories and transform the instance segmentation task into a pixel-level classification task. Experiments on the challenging KITTI dataset show that our approach outperforms LklNet about 1.8 times on the speed of segmentation and 3D detection.
arXiv Detail & Related papers (2020-01-26T09:06:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.