Unsupervised Domain Adaptation with Temporal-Consistent Self-Training
for 3D Hand-Object Joint Reconstruction
- URL: http://arxiv.org/abs/2012.11260v1
- Date: Mon, 21 Dec 2020 11:27:56 GMT
- Title: Unsupervised Domain Adaptation with Temporal-Consistent Self-Training
for 3D Hand-Object Joint Reconstruction
- Authors: Mengshi Qi, Edoardo Remelli, Mathieu Salzmann, Pascal Fua
- Abstract summary: We introduce an effective approach to addressing this challenge by exploiting 3D geometric constraints within a cycle generative adversarial network (CycleGAN)
In contrast to most existing works, we propose to enforce short- and long-term temporal consistency to fine-tune the domain-adapted model in a self-supervised fashion.
We will demonstrate that our approach outperforms state-of-the-art 3D hand-object joint reconstruction methods on three widely-used benchmarks.
- Score: 131.34795312667026
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning-solutions for hand-object 3D pose and shape estimation are now
very effective when an annotated dataset is available to train them to handle
the scenarios and lighting conditions they will encounter at test time.
Unfortunately, this is not always the case, and one often has to resort to
training them on synthetic data, which does not guarantee that they will work
well in real situations. In this paper, we introduce an effective approach to
addressing this challenge by exploiting 3D geometric constraints within a cycle
generative adversarial network (CycleGAN) to perform domain adaptation.
Furthermore, in contrast to most existing works, which fail to leverage the
rich temporal information available in unlabeled real videos as a source of
supervision, we propose to enforce short- and long-term temporal consistency to
fine-tune the domain-adapted model in a self-supervised fashion. We will
demonstrate that our approach outperforms state-of-the-art 3D hand-object joint
reconstruction methods on three widely-used benchmarks and will make our code
publicly available.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects [37.175069234979645]
This work addresses the challenging task of 3D object recognition without the reliance on real-world 3D labeled data.
Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference.
arXiv Detail & Related papers (2023-10-19T17:59:09Z) - 3D Adversarial Augmentations for Robust Out-of-Domain Predictions [115.74319739738571]
We focus on improving the generalization to out-of-domain data.
We learn a set of vectors that deform the objects in an adversarial fashion.
We perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model.
arXiv Detail & Related papers (2023-08-29T17:58:55Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation [88.8963330073454]
We propose a novel monocular 6D pose estimation approach by means of self-supervised learning.
We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model.
Our proposed self-supervision outperforms all other methods relying on synthetic data.
arXiv Detail & Related papers (2022-03-19T15:12:06Z) - Ano-Graph: Learning Normal Scene Contextual Graphs to Detect Video
Anomalies [11.935112157324122]
Video detection has proved to be a challenging task owing to its unsupervised training procedure and high-temporal existing in real-world scenarios.
We propose a novel yet efficient method named Ano-Graph for learning and modeling the interaction of normal objects.
Our method is data-efficient, significantly more robust against common real-world variations such as illumination, and passes SOTA by a large margin on the challenging datasets ADOC and Street Scene.
arXiv Detail & Related papers (2021-03-18T20:08:53Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.