CacheFlow: Fast Human Motion Prediction by Cached Normalizing Flow
- URL: http://arxiv.org/abs/2505.13140v1
- Date: Mon, 19 May 2025 14:09:45 GMT
- Title: CacheFlow: Fast Human Motion Prediction by Cached Normalizing Flow
- Authors: Takahiro Maeda, Jinkun Cao, Norimichi Ukita, Kris Kitani,
- Abstract summary: We introduce a novel flow-based method for human motion prediction called CacheFlow.<n>Unlike previous conditional generative models that suffer from time efficiency, CacheFlow takes advantage of an unconditional flow-based generative model.<n>Our method demonstrates improved density estimation accuracy and comparable prediction accuracy to a SOTA method on Human3.6M.
- Score: 38.74038236608644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many density estimation techniques for 3D human motion prediction require a significant amount of inference time, often exceeding the duration of the predicted time horizon. To address the need for faster density estimation for 3D human motion prediction, we introduce a novel flow-based method for human motion prediction called CacheFlow. Unlike previous conditional generative models that suffer from time efficiency, CacheFlow takes advantage of an unconditional flow-based generative model that transforms a Gaussian mixture into the density of future motions. The results of the computation of the flow-based generative model can be precomputed and cached. Then, for conditional prediction, we seek a mapping from historical trajectories to samples in the Gaussian mixture. This mapping can be done by a much more lightweight model, thus saving significant computation overhead compared to a typical conditional flow model. In such a two-stage fashion and by caching results from the slow flow model computation, we build our CacheFlow without loss of prediction accuracy and model expressiveness. This inference process is completed in approximately one millisecond, making it 4 times faster than previous VAE methods and 30 times faster than previous diffusion-based methods on standard benchmarks such as Human3.6M and AMASS datasets. Furthermore, our method demonstrates improved density estimation accuracy and comparable prediction accuracy to a SOTA method on Human3.6M. Our code and models will be publicly available.
Related papers
- Mean Flows for One-step Generative Modeling [64.4997821467102]
We propose a principled and effective framework for one-step generative modeling.<n>A well-defined identity between average and instantaneous velocities is derived and used to guide neural network training.<n>Our method, termed the MeanFlow model, is self-contained and requires no pre-training, distillation, or curriculum learning.
arXiv Detail & Related papers (2025-05-19T17:59:42Z) - Gaussian Mixture Flow Matching Models [51.976452482535954]
Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models re parameterize the Gaussian mean as flow velocity.<n>They underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG)<n>We introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality.
arXiv Detail & Related papers (2025-04-07T17:59:42Z) - Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing [10.542645300983878]
We propose an inference-time scaling approach for pretrained flow models.<n>We show that SDE-based generation, particularly variance-preserving (VP) interpolant-based generation, improves particle sampling methods for inference-time scaling in flow models.
arXiv Detail & Related papers (2025-03-25T06:30:45Z) - OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries.
OPUS incorporates a suite of non-trivial strategies to enhance model performance.
Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z) - Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models.
Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML)
We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z) - GDTS: Goal-Guided Diffusion Model with Tree Sampling for Multi-Modal Pedestrian Trajectory Prediction [15.731398013255179]
We propose a novel Goal-Guided Diffusion Model with Tree Sampling for multi-modal trajectory prediction.<n>A two-stage tree sampling algorithm is presented, which leverages common features to reduce the inference time and improve accuracy for multi-modal prediction.<n> Experimental results demonstrate that our proposed framework achieves comparable state-of-the-art performance with real-time inference speed in public datasets.
arXiv Detail & Related papers (2023-11-25T03:55:06Z) - Fast Inference and Update of Probabilistic Density Estimation on
Trajectory Prediction [19.240717471864723]
Safety-critical applications such as autonomous vehicles and social robots require fast computation and accurate probability density estimation.
This paper presents a new normalizing flow-based trajectory prediction model named FlowChain.
FlowChain is faster than the generative models that need additional approximations such as kernel density estimation.
arXiv Detail & Related papers (2023-08-17T07:16:21Z) - Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z) - FloMo: Tractable Motion Prediction with Normalizing Flows [0.0]
We model motion prediction as a density estimation problem with a normalizing flow between a noise sample and the future motion distribution.
Our model, named FloMo, allows likelihoods to be computed in a single network pass and can be trained directly with maximum likelihood estimation.
Our method achieves state-of-the-art performance on three popular prediction datasets, with a significant gap to most competing models.
arXiv Detail & Related papers (2021-03-05T11:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.