Related papers: Scriboora: Rethinking Human Pose Forecasting

Scriboora: Rethinking Human Pose Forecasting

URL: http://arxiv.org/abs/2511.15565v1
Date: Wed, 19 Nov 2025 15:58:33 GMT
Title: Scriboora: Rethinking Human Pose Forecasting
Authors: Daniel Bermuth, Alexander Poeppel, Wolfgang Reif,
Abstract summary: This paper evaluates a wide range of pose forecasting algorithms in the task of absolute pose forecasting.<n>Recent speech models can be efficiently adapted to the task of pose forecasting, and improve current state-of-the-art performance.
Score: 44.79834103607383
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Human pose forecasting predicts future poses based on past observations, and has many significant applications in areas such as action recognition, autonomous driving or human-robot interaction. This paper evaluates a wide range of pose forecasting algorithms in the task of absolute pose forecasting, revealing many reproducibility issues, and provides a unified training and evaluation pipeline. After drawing a high-level analogy to the task of speech understanding, it is shown that recent speech models can be efficiently adapted to the task of pose forecasting, and improve current state-of-the-art performance. At last the robustness of the models is evaluated, using noisy joint coordinates obtained from a pose estimator model, to reflect a realistic type of noise, which is more close to real-world applications. For this a new dataset variation is introduced, and it is shown that estimated poses result in a substantial performance degradation, and how much of it can be recovered again by unsupervised finetuning.

Related papers

Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments [1.411614392022118]
Existing data-driven agent-based models struggle in low-data environments.<n>This paper investigates whether large language models, pre-trained on broad human knowledge, can fill this gap.
arXiv Detail & Related papers (2026-01-20T20:58:17Z)
Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation [23.0839810713682]
Occlusions are a significant challenge to human pose estimation algorithms.<n>We propose OR-POSE: Unsupervised Domain Adaptation for Occlusion Resilient Human POSE Estimation.
arXiv Detail & Related papers (2025-01-06T05:30:37Z)
Gait Recognition from Highly Compressed Videos [3.1049440318608568]
A common mitigation strategy involves fine-tuning pose estimation models on noisy data to improve robustness. We propose a processing pipeline that incorporates a task-targeted artifact correction model designed to pre-process and enhance surveillance footage. Our experiments show a clear enhancement in gait analysis performance, supporting the viability of the proposed method.
arXiv Detail & Related papers (2024-04-18T13:46:16Z)
Fine-grained Forecasting Models Via Gaussian Process Blurring Effect [6.472434306724611]
Time series forecasting is a challenging task due to the existence of complex and dynamic temporal dependencies. Using more training data is one way to improve the accuracy, but this source is often limited. We are building on successful denoising approaches for image generation by advocating for an end-to-end forecasting and denoising paradigm.
arXiv Detail & Related papers (2023-12-21T20:25:16Z)
DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation [16.32910684198013]
We present DiffPose, a novel diffusion architecture that formulates video-based human pose estimation as a conditional heatmap generation problem. We show two unique characteristics from DiffPose on pose estimation task: (i) the ability to combine multiple sets of pose estimates to improve prediction accuracy, particularly for challenging joints, and (ii) the ability to adjust the number of iterative steps for feature refinement without retraining the model.
arXiv Detail & Related papers (2023-07-31T14:00:23Z)
Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations. We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z)
Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence. We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task. 'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame. Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information. Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z)
Confidence Adaptive Anytime Pixel-Level Recognition [86.75784498879354]
Anytime inference requires a model to make a progression of predictions which might be halted at any time. We propose the first unified and end-to-end model approach for anytime pixel-level recognition.
arXiv Detail & Related papers (2021-04-01T20:01:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.