Weak Multi-View Supervision for Surface Mapping Estimation
- URL: http://arxiv.org/abs/2105.01388v1
- Date: Tue, 4 May 2021 09:46:26 GMT
- Title: Weak Multi-View Supervision for Surface Mapping Estimation
- Authors: Nishant Rai, Aidas Liaudanskas, Srinivas Rao, Rodrigo Ortiz Cayon,
Matteo Munaro, Stefan Holzer
- Abstract summary: We propose a weakly-supervised multi-view learning approach to learn category-specific surface mapping without dense annotations.
We learn the underlying surface geometry of common categories, such as human faces, cars, and airplanes, given instances from those categories.
- Score: 0.9367260794056769
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a weakly-supervised multi-view learning approach to learn
category-specific surface mapping without dense annotations. We learn the
underlying surface geometry of common categories, such as human faces, cars,
and airplanes, given instances from those categories. While traditional
approaches solve this problem using extensive supervision in the form of
pixel-level annotations, we take advantage of the fact that pixel-level UV and
mesh predictions can be combined with 3D reprojections to form consistency
cycles. As a result of exploiting these cycles, we can establish a dense
correspondence mapping between image pixels and the mesh acting as a
self-supervisory signal, which in turn helps improve our overall estimates. Our
approach leverages information from multiple views of the object to establish
additional consistency cycles, thus improving surface mapping understanding
without the need for explicit annotations. We also propose the use of
deformation fields for predictions of an instance specific mesh. Given the lack
of datasets providing multiple images of similar object instances from
different viewpoints, we generate and release a multi-view ShapeNet Cars and
Airplanes dataset created by rendering ShapeNet meshes using a 360 degree
camera trajectory around the mesh. For the human faces category, we process and
adapt an existing dataset to a multi-view setup. Through experimental
evaluations, we show that, at test time, our method can generate accurate
variations away from the mean shape, is multi-view consistent, and performs
comparably to fully supervised approaches.
Related papers
- Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching [2.400446821380503]
We introduce an efficient framework to learn descriptors for both RGB images and point clouds.
It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy.
A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision.
arXiv Detail & Related papers (2024-10-08T18:31:41Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Scatter Points in Space: 3D Detection from Multi-view Monocular Images [8.71944437852952]
3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision.
Recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space.
We propose a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity.
arXiv Detail & Related papers (2022-08-31T09:38:05Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Efficient Textured Mesh Recovery from Multiple Views with Differentiable
Rendering [8.264851594332677]
We propose an efficient coarse-to-fine approach to recover the textured mesh from multi-view images.
We optimize the shape geometry by minimizing the difference between the rendered mesh with the depth predicted by the learning-based multi-view stereo algorithm.
In contrast to the implicit neural representation on shape and color, we introduce a physically based inverse rendering scheme to jointly estimate the lighting and reflectance of the objects.
arXiv Detail & Related papers (2022-05-25T03:33:55Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Localization and Mapping using Instance-specific Mesh Models [12.235379548921061]
This paper focuses on building semantic maps, containing object poses and shapes, using a monocular camera.
Our contribution is an instance-specific mesh model of object shape that can be optimized online based on semantic information extracted from camera images.
arXiv Detail & Related papers (2021-03-08T00:24:23Z) - Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images [64.53227129573293]
We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views.
We design neural networks capable of generating high-quality parametric 3D surfaces which are consistent between views.
Our method is supervised and trained on a public dataset of shapes from common object categories.
arXiv Detail & Related papers (2020-08-18T06:33:40Z) - Implicit Mesh Reconstruction from Unannotated Image Collections [48.85604987196472]
We present an approach to infer the 3D shape, texture, and camera pose for an object from a single RGB image.
We represent the shape as an image-conditioned implicit function that transforms the surface of a sphere to that of the predicted mesh, while additionally predicting the corresponding texture.
arXiv Detail & Related papers (2020-07-16T17:55:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.