Self-Supervised Image Representation Learning with Geometric Set
Consistency
- URL: http://arxiv.org/abs/2203.15361v1
- Date: Tue, 29 Mar 2022 08:57:33 GMT
- Title: Self-Supervised Image Representation Learning with Geometric Set
Consistency
- Authors: Nenglun Chen, Lei Chu, Hao Pan, Yan Lu and Wenping Wang
- Abstract summary: We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
- Score: 50.12720780102395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for self-supervised image representation learning under
the guidance of 3D geometric consistency. Our intuition is that 3D geometric
consistency priors such as smooth regions and surface discontinuities may imply
consistent semantics or object boundaries, and can act as strong cues to guide
the learning of 2D image representations without semantic labels. Specifically,
we introduce 3D geometric consistency into a contrastive learning framework to
enforce the feature consistency within image views. We propose to use geometric
consistency sets as constraints and adapt the InfoNCE loss accordingly. We show
that our learned image representations are general. By fine-tuning our
pre-trained representations for various 2D image-based downstream tasks,
including semantic segmentation, object detection, and instance segmentation on
real-world indoor scene datasets, we achieve superior performance compared with
state-of-the-art methods.
Related papers
- MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors [11.118490283303407]
We propose a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D.
Our method produces accurate semantics and geometry in both 3D and 2D space.
arXiv Detail & Related papers (2024-09-21T05:12:13Z) - 3D Congealing: 3D-Aware Image Alignment in the Wild [44.254247801001675]
3D Congealing is a problem of 3D-aware alignment for 2D images capturing semantically similar objects.
We introduce a general framework that tackles the task without assuming shape templates, poses, or any camera parameters.
Our framework can be used for various tasks such as correspondence matching, pose estimation, and image editing.
arXiv Detail & Related papers (2024-04-02T17:32:12Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - Pri3D: Can 3D Priors Help 2D Representation Learning? [37.35721274841419]
We introduce an approach to learn view-invariant,geometry-aware representations for network pre-training.
We employ contrastive learning under both multi-view im-age constraints and image-geometry constraints to encode3D priors into learned 2D representations.
arXiv Detail & Related papers (2021-04-22T17:59:30Z) - Joint Deep Multi-Graph Matching and 3D Geometry Learning from
Inhomogeneous 2D Image Collections [57.60094385551773]
We propose a trainable framework for learning a deformable 3D geometry model from inhomogeneous image collections.
We in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.
arXiv Detail & Related papers (2021-03-31T17:25:36Z) - Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations [92.89846887298852]
We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
arXiv Detail & Related papers (2020-03-22T22:44:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.