Related papers: Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis

Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis

URL: http://arxiv.org/abs/2411.08603v1
Date: Wed, 13 Nov 2024 13:40:27 GMT
Title: Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis
Authors: Dominik Borer, Jakob Buhmann, Martin Guay,
Abstract summary: We develop a more expressive intermediate skeleton representation capable of capturing the semantics of the pose. We extend the analysis-by-synthesis framework with a training protocol based on synthetic data. Our approach outperforms previous models trained with analysis-by-synthesis on standard benchmarks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern pose estimation models are trained on large, manually-labelled datasets which are costly and may not cover the full extent of human poses and appearances in the real world. With advances in neural rendering, analysis-by-synthesis and the ability to not only predict, but also render the pose, is becoming an appealing framework, which could alleviate the need for large scale manual labelling efforts. While recent work have shown the feasibility of this approach, the predictions admit many flips due to a simplistic intermediate skeleton representation, resulting in low precision and inhibiting the acquisition of any downstream knowledge such as three-dimensional positioning. We solve this problem with a more expressive intermediate skeleton representation capable of capturing the semantics of the pose (left and right), which significantly reduces flips. To successfully train this new representation, we extend the analysis-by-synthesis framework with a training protocol based on synthetic data. We show that our representation results in less flips and more accurate predictions. Our approach outperforms previous models trained with analysis-by-synthesis on standard benchmarks.

Related papers

Robust Human Trajectory Prediction via Self-Supervised Skeleton Representation Learning [12.961180148172199]
We propose a robust trajectory prediction method that incorporates a self-supervised skeleton representation model pretrained with masked autoencoding.<n> Experimental results show that our method improves robustness to missing skeletal data without sacrificing prediction accuracy, and consistently outperforms baseline models in clean-to-moderate missingness regimes.
arXiv Detail & Related papers (2026-02-26T09:25:52Z)
A Minimal Task Reveals Emergent Path Integration and Object-Location Binding in a Predictive Sequence Model [0.0]
We show that action-conditioned sequential prediction suffices for learning "world models"<n>We train a recurrent neural network to predict the upcoming token from current input and a saccade-like displacement.<n>Decoding analyses reveal path integration and dynamic binding of token identity to position.
arXiv Detail & Related papers (2026-02-03T13:08:27Z)
Scriboora: Rethinking Human Pose Forecasting [44.79834103607383]
This paper evaluates a wide range of pose forecasting algorithms in the task of absolute pose forecasting.<n>Recent speech models can be efficiently adapted to the task of pose forecasting, and improve current state-of-the-art performance.
arXiv Detail & Related papers (2025-11-19T15:58:33Z)
Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation [3.126179109712709]
We propose a mesh represented recycle learning strategy for 3D hand pose and mesh estimation. To be specific, a hand pose and mesh estimation model first predicts parametric 3D hand annotations. Second, synthetic hand images are generated with self-estimated hand mesh representations. Third, the synthetic hand images are fed into the same model again.
arXiv Detail & Related papers (2023-10-18T09:50:09Z)
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data. We learn to predict realistic texture of objects from real image collections. We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z)
Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations [51.552870594221865]
We show that last layer retraining can match or outperform state-of-the-art approaches on spurious correlation benchmarks. We also show that last layer retraining on large ImageNet-trained models can significantly reduce reliance on background and texture information.
arXiv Detail & Related papers (2022-04-06T16:55:41Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training [52.93808218720784]
Synthetic-to-real transfer learning is a framework in which we pre-train models with synthetically generated images and ground-truth annotations for real tasks. Although synthetic images overcome the data scarcity issue, it remains unclear how the fine-tuning performance scales with pre-trained models. We observe a simple and general scaling law that consistently describes learning curves in various tasks, models, and complexities of synthesized pre-training data.
arXiv Detail & Related papers (2021-08-25T02:29:28Z)
MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction [34.565986275769745]
We propose a novel Multi-Scale Residual Graph Convolution Network (MSR-GCN) for human pose prediction task. Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset.
arXiv Detail & Related papers (2021-08-16T15:26:23Z)
Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning. It aims to extract both the common information and the complementary information in an adversarial setting. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering [13.219688351773422]
We propose a test-time optimization approach for monocular motion capture that learns a volumetric body model of the user in a self-supervised manner. Our approach is self-supervised and does not require any additional ground truth labels for appearance, pose, or 3D shape. We demonstrate that our novel combination of a discriminative pose estimation technique with surface-free analysis-by-synthesis outperforms purely discriminative monocular pose estimation approaches.
arXiv Detail & Related papers (2021-02-11T18:58:31Z)
Masked Linear Regression for Learning Local Receptive Fields for Facial Expression Synthesis [10.28711904929932]
We propose a constrained version of ridge regression that exploits the local and sparse structure of facial expressions. In contrast to the existing approaches, our proposed model can be efficiently trained on larger image sizes. The proposed algorithm is also compared with state-of-the-art GANs including Pix2Pix, CycleGAN, StarGAN and GANimation.
arXiv Detail & Related papers (2020-11-18T06:04:24Z)
Monocular Human Pose and Shape Reconstruction using Part Differentiable Rendering [53.16864661460889]
Recent works succeed in regression-based methods which estimate parametric models directly through a deep neural network supervised by 3D ground truth. In this paper, we introduce body segmentation as critical supervision. To improve the reconstruction with part segmentation, we propose a part-level differentiable part that enables part-based models to be supervised by part segmentation.
arXiv Detail & Related papers (2020-03-24T14:25:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.