Contrastive Spatial Reasoning on Multi-View Line Drawings
- URL: http://arxiv.org/abs/2104.13433v1
- Date: Tue, 27 Apr 2021 19:05:27 GMT
- Title: Contrastive Spatial Reasoning on Multi-View Line Drawings
- Authors: Siyuan Xiang, Anbang Yang, Yanfei Xue, Yaoqing Yang, Chen Feng
- Abstract summary: State-of-the-art supervised deep networks show puzzling low performances on the SPARE3D dataset.
We propose a simple contrastive learning approach along with other network modifications to improve the baseline performance.
Our approach uses a self-supervised binary classification network to compare the line drawing differences between various views of any two similar 3D objects.
- Score: 11.102238863932255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial reasoning on multi-view line drawings by state-of-the-art supervised
deep networks is recently shown with puzzling low performances on the SPARE3D
dataset. To study the reason behind the low performance and to further our
understandings of these tasks, we design controlled experiments on both input
data and network designs. Guided by the hindsight from these experiment
results, we propose a simple contrastive learning approach along with other
network modifications to improve the baseline performance. Our approach uses a
self-supervised binary classification network to compare the line drawing
differences between various views of any two similar 3D objects. It enables
deep networks to effectively learn detail-sensitive yet view-invariant line
drawing representations of 3D objects. Experiments show that our method could
significantly increase the baseline performance in SPARE3D, while some popular
self-supervised learning methods cannot.
Related papers
- HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Fine-Tuned but Zero-Shot 3D Shape Sketch View Similarity and Retrieval [8.540349872620993]
We show that in a zero-shot setting, the more abstract the sketch, the higher the likelihood of incorrect image matches.
One of the key findings of our research is that meticulous fine-tuning on one class of 3D shapes can lead to improved performance on other shape classes.
arXiv Detail & Related papers (2023-06-14T14:40:50Z) - Designing Deep Networks for Scene Recognition [3.493180651702109]
We conduct extensive experiments to demonstrate the widely accepted principles in network design may result in dramatic performance differences when the data is altered.
This paper presents a novel network design methodology: data-oriented network design.
We propose a Deep-Narrow Network and Dilated Pooling module, which improved the scene recognition performance using less than half of the computational resources.
arXiv Detail & Related papers (2023-03-13T18:28:06Z) - FuNNscope: Visual microscope for interactively exploring the loss
landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks.
We generalize observations on small neural networks to more complex systems.
An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z) - Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds
of Large Scenes with Learned Virtual View Visibility [17.929307870456416]
We present a novel framework for mesh reconstruction from unstructured point clouds.
We take advantage of the learned visibility of the 3D points in the virtual views and traditional graph-cut based mesh generation.
arXiv Detail & Related papers (2021-08-18T20:28:16Z) - Deep Contrastive Learning for Multi-View Network Embedding [20.035449838566503]
Multi-view network embedding aims at projecting nodes in the network to low-dimensional vectors.
Most contrastive learning-based methods mostly rely on high-quality graph embedding.
We design a novel node-to-node Contrastive learning framework for Multi-view network Embedding (CREME)
arXiv Detail & Related papers (2021-08-16T06:29:18Z) - Image GANs meet Differentiable Rendering for Inverse Graphics and
Interpretable 3D Neural Rendering [101.56891506498755]
Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks.
We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets.
arXiv Detail & Related papers (2020-10-18T22:29:07Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - PVSNet: Pixelwise Visibility-Aware Multi-View Stereo Network [32.41293572426403]
A Pixelwise Visibility-aware multi-view Stereo Network (PVSNet) is proposed for robust dense 3D reconstruction.
PVSNet is the first deep learning framework that is able to capture the visibility information of different neighboring views.
Experiments show that PVSNet achieves the state-of-the-art performance on different datasets.
arXiv Detail & Related papers (2020-07-15T14:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.