Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach
- URL: http://arxiv.org/abs/2501.16146v1
- Date: Mon, 27 Jan 2025 15:39:39 GMT
- Title: Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach
- Authors: Hoosang Lee, Jeha Ryu,
- Abstract summary: Performance degradation caused by domain gaps between source and target domains remains a major challenge to generalization.
We propose a novel canonical domain approach that maps both the source and target domains into a unified canonical domain.
The proposed method substantially improves generalization capability across datasets while using the same data volume.
- Score: 0.0
- License:
- Abstract: Recent advancements in deep learning methods have significantly improved the performance of 3D Human Pose Estimation (HPE). However, performance degradation caused by domain gaps between source and target domains remains a major challenge to generalization, necessitating extensive data augmentation and/or fine-tuning for each specific target domain. To address this issue more efficiently, we propose a novel canonical domain approach that maps both the source and target domains into a unified canonical domain, alleviating the need for additional fine-tuning in the target domain. To construct the canonical domain, we introduce a canonicalization process to generate a novel canonical 2D-3D pose mapping that ensures 2D-3D pose consistency and simplifies 2D-3D pose patterns, enabling more efficient training of lifting networks. The canonicalization of both domains is achieved through the following steps: (1) in the source domain, the lifting network is trained within the canonical domain; (2) in the target domain, input 2D poses are canonicalized prior to inference by leveraging the properties of perspective projection and known camera intrinsics. Consequently, the trained network can be directly applied to the target domain without requiring additional fine-tuning. Experiments conducted with various lifting networks and publicly available datasets (e.g., Human3.6M, Fit3D, MPI-INF-3DHP) demonstrate that the proposed method substantially improves generalization capability across datasets while using the same data volume.
Related papers
- Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection [50.448520056844885]
We propose a novel framework for syn-to-real unsupervised domain adaptation in indoor 3D object detection.
Our adaptation results from synthetic dataset 3D-FRONT to real-world datasets ScanNetV2 and SUN RGB-D demonstrate remarkable mAP25 improvements of 9.7% and 9.1% over Source-Only baselines.
arXiv Detail & Related papers (2024-06-17T08:18:41Z) - Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation [17.875516787157018]
We study how to harness the knowledge priors learned by 2D visual foundation models to produce more accurate labels for unlabeled target domains.
Our method is evaluated on various autonomous driving datasets and the results demonstrate a significant improvement for 3D segmentation task.
arXiv Detail & Related papers (2024-03-15T03:58:17Z) - CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D
Object Detection [14.063365469339812]
LiDAR-based 3D Object Detection methods often do not generalize well to target domains outside the source (or training) data distribution.
We introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which leverages visual semantic cues from an image modality.
We also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features.
arXiv Detail & Related papers (2024-03-06T14:12:38Z) - Source-Free and Image-Only Unsupervised Domain Adaptation for Category
Level Object Pose Estimation [18.011044932979143]
3DUDA is a method capable of adapting to a nuisance-ridden target domain without 3D or depth data.
We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations.
We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions.
arXiv Detail & Related papers (2024-01-19T17:48:05Z) - BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain
Generalization of 3D Semantic Segmentation [59.99683295806698]
Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the complementarity of 2D-3D data to overcome the lack of annotation in a new domain.
We propose cross-modal learning under bird's-eye view for Domain Generalization (DG) of 3D semantic segmentation, called BEV-DG.
arXiv Detail & Related papers (2023-08-12T11:09:17Z) - Global Adaptation meets Local Generalization: Unsupervised Domain
Adaptation for 3D Human Pose Estimation [31.178656420040692]
textitPoseDA achieves 61.3 mm of MPJPE on MPI-INF-3DHP under a cross-dataset evaluation setup, improving upon the previous state-of-the-art method by 10.2%.
arXiv Detail & Related papers (2023-03-29T04:54:42Z) - Geometry-Aware Network for Domain Adaptive Semantic Segmentation [64.00345743710653]
We propose a novel Geometry-Aware Network for Domain Adaptation (GANDA) to shrink the domain gaps.
We exploit 3D topology on the point clouds generated from RGB-D images for coordinate-color disentanglement and pseudo-labels refinement in the target domain.
Our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.
arXiv Detail & Related papers (2022-12-02T00:48:44Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency [90.71745178767203]
Deep learning-based 3D object detection has achieved unprecedented success with the advent of large-scale autonomous driving datasets.
Existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.
We study a more realistic setting, unsupervised 3D domain adaptive detection, which only utilizes source domain annotations.
arXiv Detail & Related papers (2021-07-23T17:19:23Z) - Domain Conditioned Adaptation Network [90.63261870610211]
We propose a Domain Conditioned Adaptation Network (DCAN) to excite distinct convolutional channels with a domain conditioned channel attention mechanism.
This is the first work to explore the domain-wise convolutional channel activation for deep DA networks.
arXiv Detail & Related papers (2020-05-14T04:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.