Related papers: FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding

FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding

URL: http://arxiv.org/abs/2510.10868v1
Date: Mon, 13 Oct 2025 00:23:17 GMT
Title: FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding
Authors: Soroush Mehraban, Andrea Iaboni, Babak Taati,
Abstract summary: We introduce two HMR-specific merging strategies: Error-Constrained Layer Merging (ECLM) and Mask-guided Token Merging (Mask-ToMe)<n> Experiments across multiple benchmarks demonstrate that our method achieves up to 2.3x speed-up while slightly improving performance over the baseline.
Score: 2.309307613420651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent transformer-based models for 3D Human Mesh Recovery (HMR) have achieved strong performance but often suffer from high computational cost and complexity due to deep transformer architectures and redundant tokens. In this paper, we introduce two HMR-specific merging strategies: Error-Constrained Layer Merging (ECLM) and Mask-guided Token Merging (Mask-ToMe). ECLM selectively merges transformer layers that have minimal impact on the Mean Per Joint Position Error (MPJPE), while Mask-ToMe focuses on merging background tokens that contribute little to the final prediction. To further address the potential performance drop caused by merging, we propose a diffusion-based decoder that incorporates temporal context and leverages pose priors learned from large-scale motion capture datasets. Experiments across multiple benchmarks demonstrate that our method achieves up to 2.3x speed-up while slightly improving performance over the baseline.

Related papers

URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding [55.45331924836242]
We present URaG, a framework that Unifies Retrieval and Generation within a single MLLM.<n>We show that URaG achieves state-of-the-art performance while reducing computational overhead by 44-56%.
arXiv Detail & Related papers (2025-11-13T17:54:09Z)
Flow-Matching Guided Deep Unfolding for Hyperspectral Image Reconstruction [53.26903617819014]
Flow-Matching-guided Unfolding network (FMU) is first to integrate flow matching into HSI reconstruction.<n>To further strengthen the learned dynamics, we introduce a mean velocity loss.<n>Experiments on both simulated and real datasets show that FMU significantly outperforms existing approaches in reconstruction quality.
arXiv Detail & Related papers (2025-10-02T11:32:00Z)
ToMA: Token Merge with Attention for Diffusion Models [8.079656935981193]
Diffusion models excel in high-fidelity image generation but face scalability limits due to transformers' quadratic attention complexity.<n>We propose Token Merge with Attention (ToMA), an off-the-shelf method that negates token reduction for GPU-aligned efficiency.<n>ToMA reduces SDXL/Flux generation latency by 24%/23%, respectively (with DINO $Delta 0.07$), outperforming prior methods.
arXiv Detail & Related papers (2025-09-13T17:35:00Z)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding [50.9948753314669]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z)
High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution [87.56382172827526]
High-frequency regions are most critical for reconstruction.<n>We propose a training-free adaptive masking module for acceleration.<n>Our method reduces FLOPs by 24--43% for state-of-the-art models.
arXiv Detail & Related papers (2025-05-11T13:18:03Z)
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference [8.792650582656913]
We introduce Mixture of Multi-rate Residuals (M2R2), a framework that dynamically modulates residual velocity to improve early alignment.<n>M2R2 surpasses state-of-the-art distance-based strategies, balancing generation quality and speedup.<n>In self-speculative decoding setup, M2R2 achieves up to 2.8x speedups on MT-Bench.
arXiv Detail & Related papers (2025-02-04T06:13:52Z)
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds. We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context. We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z)
Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers [17.22112222736234]
Transformer encoder architectures have recently achieved state-of-the-art results on monocular 3D human mesh reconstruction. Due to the large memory overhead and slow inference speed, it is difficult to deploy such models for practical use. We propose a novel transformer encoder-decoder architecture for 3D human mesh reconstruction from a single image, called FastMETRO.
arXiv Detail & Related papers (2022-07-27T22:54:09Z)
Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z)
Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM [0.0]
We propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet. SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution. Our model outperforms the accuracy on the larger and more challenging RWF-2000 dataset by more than a 2% margin.
arXiv Detail & Related papers (2021-02-21T12:01:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.