Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image
- URL: http://arxiv.org/abs/2306.05145v1
- Date: Thu, 8 Jun 2023 12:12:02 GMT
- Title: Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image
- Authors: Kun Wang, Zhiqiang Yan, Zhenyu Zhang, Xiang Li, Jun Li, and Jian Yang
- Abstract summary: We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters.
We parameterize the geometry and appearance of the object using a multi-scale global feature extractor.
We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
- Score: 27.290232027686237
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reconstructing category-specific objects from a single image is a challenging
task that requires inferring the geometry and appearance of an object from a
limited viewpoint. Existing methods typically rely on local feature retrieval
based on re-projection with known camera intrinsic, which are slow and prone to
distortion at viewpoints distant from the input image. In this paper, we
present Variable Radiance Field (VRF), a novel framework that can efficiently
reconstruct category-specific objects from a single image without known camera
parameters. Our key contributions are: (1) We parameterize the geometry and
appearance of the object using a multi-scale global feature extractor, which
avoids frequent point-wise feature retrieval and camera dependency. We also
propose a contrastive learning-based pretraining strategy to improve the
feature extractor. (2) We reduce the geometric complexity of the object by
learning a category template, and use hypernetworks to generate a small neural
radiance field for fast and instance-specific rendering. (3) We align each
training instance to the template space using a learned similarity
transformation, which enables semantic-consistent learning across different
objects. We evaluate our method on the CO3D dataset and show that it
outperforms existing methods in terms of quality and speed. We also demonstrate
its applicability to shape interpolation and object placement tasks.
Related papers
- SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects [20.978091381109294]
We propose a method to generate articulated objects from a single image.
Our method generates an articulated object that is visually consistent with the input image.
Our experiments show that our method outperforms the state-of-the-art in articulated object creation.
arXiv Detail & Related papers (2024-10-21T20:41:32Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - SemAug: Semantically Meaningful Image Augmentations for Object Detection
Through Language Grounding [5.715548995729382]
We propose an effective technique for image augmentation by injecting contextually meaningful knowledge into the scenes.
Our method of semantically meaningful image augmentation for object detection via language grounding, SemAug, starts by calculating semantically appropriate new objects.
arXiv Detail & Related papers (2022-08-15T19:00:56Z) - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance
Consistency [59.427074701985795]
Single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry.
We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of different object instances.
Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction, a loss enforcing consistency between instances having similar shape or texture.
arXiv Detail & Related papers (2022-04-21T17:47:35Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Unsupervised Layered Image Decomposition into Object Prototypes [39.20333694585477]
We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models.
We first validate our approach by providing results on par with the state of the art on standard multi-object synthetic benchmarks.
We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images.
arXiv Detail & Related papers (2021-04-29T18:02:01Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.