TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose
Estimation
- URL: http://arxiv.org/abs/2212.12902v1
- Date: Sun, 25 Dec 2022 13:36:32 GMT
- Title: TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose
Estimation
- Authors: Hanzhi Chen, Fabian Manhardt, Nassir Navab, Benjamin Busam
- Abstract summary: We introduce neural texture learning for 6D object pose estimation from synthetic data.
We learn to predict realistic texture of objects from real image collections.
We learn pose estimation from pixel-perfect synthetic data.
- Score: 55.94900327396771
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce neural texture learning for 6D object pose
estimation from synthetic data and a few unlabelled real images. Our major
contribution is a novel learning scheme which removes the drawbacks of previous
works, namely the strong dependency on co-modalities or additional refinement.
These have been previously necessary to provide training signals for
convergence. We formulate such a scheme as two sub-optimisation problems on
texture learning and pose learning. We separately learn to predict realistic
texture of objects from real image collections and learn pose estimation from
pixel-perfect synthetic data. Combining these two capabilities allows then to
synthesise photorealistic novel views to supervise the pose estimator with
accurate geometry. To alleviate pose noise and segmentation imperfection
present during the texture learning phase, we propose a surfel-based
adversarial training loss together with texture regularisation from synthetic
data. We demonstrate that the proposed approach significantly outperforms the
recent state-of-the-art methods without ground-truth pose annotations and
demonstrates substantial generalisation improvements towards unseen scenes.
Remarkably, our scheme improves the adopted pose estimators substantially even
when initialised with much inferior performance.
Related papers
- Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - GS-Pose: Category-Level Object Pose Estimation via Geometric and
Semantic Correspondence [5.500735640045456]
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics.
We propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.
This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance.
arXiv Detail & Related papers (2023-11-23T02:35:38Z) - Sim2Real Instance-Level Style Transfer for 6D Pose Estimation [0.4893345190925177]
We introduce a simulation to reality (sim2real) instance-level style transfer for 6D pose estimation network training.
Our approach transfers the style of target objects individually, from synthetic to real, without human intervention.
arXiv Detail & Related papers (2022-03-03T23:46:47Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Multi-View Consistency Loss for Improved Single-Image 3D Reconstruction
of Clothed People [36.30755368202957]
We present a novel method to improve the accuracy of the 3D reconstruction of clothed human shape from a single image.
The accuracy and completeness for reconstruction of clothed people is limited due to the large variation in shape resulting from clothing, hair, body size, pose and camera viewpoint.
arXiv Detail & Related papers (2020-09-29T17:18:00Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Pose Proposal Critic: Robust Pose Refinement by Learning Reprojection
Errors [17.918364675642998]
We focus our attention on pose refinement, and show how to push the state-of-the-art further in the case of partial occlusions.
The proposed pose refinement method leverages on a simplified learning task, where a CNN is trained to estimate the reprojection error between an observed and a rendered image.
Current state-of-the-art results are outperformed for two out of three metrics on the Occlusion LINEMOD benchmark, while performing on-par for the final metric.
arXiv Detail & Related papers (2020-05-13T11:46:04Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z) - Introducing Pose Consistency and Warp-Alignment for Self-Supervised 6D
Object Pose Estimation in Color Images [38.9238085806793]
Most successful approaches to estimate the 6D pose of an object typically train a neural network by supervising the learning with annotated poses in real world images.
A two-stage 6D object pose estimator framework that can be applied on top of existing neural-network-based approaches is proposed.
arXiv Detail & Related papers (2020-03-27T11:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.