Related papers: SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery

SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery

URL: http://arxiv.org/abs/2511.20157v3
Date: Thu, 27 Nov 2025 03:22:40 GMT
Title: SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery
Authors: Da Li, Jiping Jin, Xuanlong Yu, Wei Liu, Xiaodong Cun, Kai Chen, Rui Fan, Jiangang Kong, Xi Shen,
Abstract summary: We introduce SKEL-CF, a coarse-to-fine framework for SKEL parameter estimation.<n>SKEL-CF employs a transformer-based encoder-decoder architecture, where the encoder predicts coarse camera and SKEL parameters.<n>Our results establish SKEL-CF as a scalable and anatomically faithful framework for human motion analysis.
Score: 33.63204394371213
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Parametric 3D human models such as SMPL have driven significant advances in human pose and shape estimation, yet their simplified kinematics limit biomechanical realism. The recently proposed SKEL model addresses this limitation by re-rigging SMPL with an anatomically accurate skeleton. However, estimating SKEL parameters directly remains challenging due to limited training data, perspective ambiguities, and the inherent complexity of human articulation. We introduce SKEL-CF, a coarse-to-fine framework for SKEL parameter estimation. SKEL-CF employs a transformer-based encoder-decoder architecture, where the encoder predicts coarse camera and SKEL parameters, and the decoder progressively refines them in successive layers. To ensure anatomically consistent supervision, we convert the existing SMPL-based dataset 4DHuman into a SKEL-aligned version, 4DHuman-SKEL, providing high-quality training data for SKEL estimation. In addition, to mitigate depth and scale ambiguities, we explicitly incorporate camera modeling into the SKEL-CF pipeline and demonstrate its importance across diverse viewpoints. Extensive experiments validate the effectiveness of the proposed design. On the challenging MOYO dataset, SKEL-CF achieves 85.0 MPJPE / 51.4 PA-MPJPE, significantly outperforming the previous SKEL-based state-of-the-art HSMR (104.5 / 79.6). These results establish SKEL-CF as a scalable and anatomically faithful framework for human motion analysis, bridging the gap between computer vision and biomechanics. Our implementation is available on the project page: https://pokerman8.github.io/SKEL-CF/.

Related papers

UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass [83.7071371474926]
UniSH is a unified, feed-forward framework for joint metric-scale 3D scene and human reconstruction.<n>Our framework bridges strong, disparate priors from scene reconstruction and HMR.<n>Our model achieves state-of-the-art performance on human-centric scene reconstruction.
arXiv Detail & Related papers (2026-01-03T16:06:27Z)
From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans [50.014530130312714]
We develop SKEL, which re-rigs the SMPL body model with a biomechanics skeleton.<n>We show that SKEL has more biomechanically accurate joint locations than SMPL, and the bones fit inside the body surface better than previous methods.
arXiv Detail & Related papers (2025-09-08T12:24:27Z)
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.<n>Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.<n>We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z)
EA-RAS: Towards Efficient and Accurate End-to-End Reconstruction of Anatomical Skeleton [28.290019864619605]
EA-RAS is a single-stage, lightweight, and plug-and-play anatomical skeleton estimator. It can provide real-time, accurate anatomically realistic skeletons with arbitrary pose using only a single RGB image input. Our regression method is over 800 times faster than existing methods, meeting real-time requirements.
arXiv Detail & Related papers (2024-09-03T02:46:28Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences [3.650839294933459]
General human motion encoders trained on large-scale human motion datasets for analyzing gait patterns in PD patients. We evaluate six pre-trained state-of-the-art human motion encoder models on their ability to predict the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS-III) gait scores from motion capture data.
arXiv Detail & Related papers (2024-05-28T04:29:10Z)
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training [110.55093254677638]
We propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL) In this paper, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE. Our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods.
arXiv Detail & Related papers (2023-07-17T13:33:11Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.