Pose Adaptive Dual Mixup for Few-Shot Single-View 3D Reconstruction
- URL: http://arxiv.org/abs/2112.12484v1
- Date: Thu, 23 Dec 2021 12:22:08 GMT
- Title: Pose Adaptive Dual Mixup for Few-Shot Single-View 3D Reconstruction
- Authors: Ta-Ying Cheng, Hsuan-Ru Yang, Niki Trigoni, Hwann-Tzong Chen, Tyng-Luh
Liu
- Abstract summary: We present a pose adaptive few-shot learning procedure and a two-stage data regularization, termed PADMix, for single-image 3D reconstruction.
PADMix significantly outperforms previous literature on few-shot settings over the ShapeNet dataset and sets new benchmarks on the more challenging real-world Pix3D dataset.
- Score: 35.30827580375749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a pose adaptive few-shot learning procedure and a two-stage data
interpolation regularization, termed Pose Adaptive Dual Mixup (PADMix), for
single-image 3D reconstruction. While augmentations via interpolating
feature-label pairs are effective in classification tasks, they fall short in
shape predictions potentially due to inconsistencies between interpolated
products of two images and volumes when rendering viewpoints are unknown.
PADMix targets this issue with two sets of mixup procedures performed
sequentially. We first perform an input mixup which, combined with a pose
adaptive learning procedure, is helpful in learning 2D feature extraction and
pose adaptive latent encoding. The stagewise training allows us to build upon
the pose invariant representations to perform a follow-up latent mixup under
one-to-one correspondences between features and ground-truth volumes. PADMix
significantly outperforms previous literature on few-shot settings over the
ShapeNet dataset and sets new benchmarks on the more challenging real-world
Pix3D dataset.
Related papers
- BIFRÖST: 3D-Aware Image compositing with Language Instructions [27.484947109237964]
Bifr"ost is a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition.
Bifr"ost addresses issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process.
arXiv Detail & Related papers (2024-10-24T18:35:12Z) - Adaptive Mix for Semi-Supervised Medical Image Segmentation [22.69909762038458]
We propose an Adaptive Mix algorithm (AdaMix) for image mix-up in a self-paced learning manner.
We develop three frameworks with our AdaMix, i.e., AdaMix-ST, AdaMix-MT, and AdaMix-CT, for semi-supervised medical image segmentation.
arXiv Detail & Related papers (2024-07-31T13:19:39Z) - Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding [64.65145700121442]
We introduce MM-Mixing, a multi-modal mixing alignment framework for 3D understanding.
Our proposed two-stage training pipeline combines feature-level and input-level mixing to optimize the 3D encoder.
We demonstrate that MM-Mixing significantly improves baseline performance across various learning scenarios.
arXiv Detail & Related papers (2024-05-28T18:44:15Z) - MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation [8.840744039764092]
We propose a single-shot approach to determining 6-DoF pose of an object with available 3D computer-aided design (CAD) model from a single RGB image.
Our method, dubbed MRC-Net, comprises two stages. The first performs pose classification and renders the 3D object in the classified pose.
The second stage performs regression to predict fine-grained residual pose within class.
arXiv Detail & Related papers (2024-03-12T18:36:59Z) - AdaptivePose++: A Powerful Single-Stage Network for Multi-Person Pose
Regression [66.39539141222524]
We propose to represent the human parts as adaptive points and introduce a fine-grained body representation method.
With the proposed body representation, we deliver a compact single-stage multi-person pose regression network, termed as AdaptivePose.
We employ AdaptivePose for both 2D/3D multi-person pose estimation tasks to verify the effectiveness of AdaptivePose.
arXiv Detail & Related papers (2022-10-08T12:54:20Z) - (Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
Based On Transformer for 3D Human Pose Estimation [1.52292571922932]
Many previous methods lack the understanding of local joint information.cite8888987considers the temporal relationship of a single joint in this work.
Our proposed textbfFusionformer method introduces a global-temporal self-trajectory module and a cross-temporal self-trajectory module.
The results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset.
arXiv Detail & Related papers (2022-10-08T12:22:10Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose
Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation.
We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation.
In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z) - Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online
Adaptation [87.85851771425325]
We consider a new problem of adapting a human mesh reconstruction model to out-of-domain streaming videos.
We tackle this problem through online adaptation, gradually correcting the model bias during testing.
We propose the Dynamic Bilevel Online Adaptation algorithm (DynaBOA)
arXiv Detail & Related papers (2021-11-07T07:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.