Semantic Correspondence via 2D-3D-2D Cycle
- URL: http://arxiv.org/abs/2004.09061v2
- Date: Tue, 13 Apr 2021 06:20:57 GMT
- Title: Semantic Correspondence via 2D-3D-2D Cycle
- Authors: Yang You, Chengkun Li, Yujing Lou, Zhoujun Cheng, Lizhuang Ma, Cewu
Lu, Weiming Wang
- Abstract summary: We propose a new method on predicting semantic correspondences by leveraging it to 3D domain.
We show that our method gives comparative and even superior results on standard semantic benchmarks.
- Score: 58.023058561837686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual semantic correspondence is an important topic in computer vision and
could help machine understand objects in our daily life. However, most previous
methods directly train on correspondences in 2D images, which is end-to-end but
loses plenty of information in 3D spaces. In this paper, we propose a new
method on predicting semantic correspondences by leveraging it to 3D domain and
then project corresponding 3D models back to 2D domain, with their semantic
labels. Our method leverages the advantages in 3D vision and can explicitly
reason about objects self-occlusion and visibility. We show that our method
gives comparative and even superior results on standard semantic benchmarks. We
also conduct thorough and detailed experiments to analyze our network
components. The code and experiments are publicly available at
https://github.com/qq456cvb/SemanticTransfer.
Related papers
- SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for
3D Visual Grounding [23.672405624011873]
We propose a module to consolidate the 3D visual stream by 2D clues synthesized from point clouds.
We empirically show their aptitude to boost the quality of the learned visual representations.
Our proposed module, dubbed as Look Around and Refer (LAR), significantly outperforms the state-of-the-art 3D visual grounding techniques on three benchmarks.
arXiv Detail & Related papers (2022-11-25T17:12:08Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Understanding Pixel-level 2D Image Semantics with 3D Keypoint Knowledge
Engine [56.09471066808409]
We propose a new method on predicting image corresponding semantics in 3D domain and then projecting them back onto 2D images to achieve pixel-level understanding.
We build a large scale keypoint knowledge engine called KeypointNet, which contains 103,450 keypoints and 8,234 3D models from 16 object categories.
arXiv Detail & Related papers (2021-11-21T13:25:20Z) - SAT: 2D Semantics Assisted Training for 3D Visual Grounding [95.84637054325039]
3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.
Point clouds are sparse, noisy, and contain limited semantic information compared with 2D images.
We propose 2D Semantics Assisted Training (SAT) that utilizes 2D image semantics in the training stage to ease point-cloud-language joint representation learning.
arXiv Detail & Related papers (2021-05-24T17:58:36Z) - Bidirectional Projection Network for Cross Dimension Scene Understanding [69.29443390126805]
We present a emphbidirectional projection network (BPNet) for joint 2D and 3D reasoning in an end-to-end manner.
Via the emphBPM, complementary 2D and 3D information can interact with each other in multiple architectural levels.
Our emphBPNet achieves top performance on the ScanNetV2 benchmark for both 2D and 3D semantic segmentation.
arXiv Detail & Related papers (2021-03-26T08:31:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.