MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction
- URL: http://arxiv.org/abs/2507.15212v1
- Date: Mon, 21 Jul 2025 03:24:30 GMT
- Title: MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction
- Authors: Yusuke Yoshiyasu, Leyuan Sun, Ryusuke Sagawa,
- Abstract summary: We introduce MeshMamba, a neural network model for learning 3D articulated mesh models.<n>MambaDiff3D can generate dense 3D human meshes in clothes, with grasping hands, etc.<n>Mamba-HMR extends the capabilities of previous non-parametric human mesh recovery approaches.
- Score: 1.8843687952462742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce MeshMamba, a neural network model for learning 3D articulated mesh models by employing the recently proposed Mamba State Space Models (Mamba-SSMs). MeshMamba is efficient and scalable in handling a large number of input tokens, enabling the generation and reconstruction of body mesh models with more than 10,000 vertices, capturing clothing and hand geometries. The key to effectively learning MeshMamba is the serialization technique of mesh vertices into orderings that are easily processed by Mamba. This is achieved by sorting the vertices based on body part annotations or the 3D vertex locations of a template mesh, such that the ordering respects the structure of articulated shapes. Based on MeshMamba, we design 1) MambaDiff3D, a denoising diffusion model for generating 3D articulated meshes and 2) Mamba-HMR, a 3D human mesh recovery model that reconstructs a human body shape and pose from a single image. Experimental results showed that MambaDiff3D can generate dense 3D human meshes in clothes, with grasping hands, etc., and outperforms previous approaches in the 3D human shape generation task. Additionally, Mamba-HMR extends the capabilities of previous non-parametric human mesh recovery approaches, which were limited to handling body-only poses using around 500 vertex tokens, to the whole-body setting with face and hands, while achieving competitive performance in (near) real-time.
Related papers
- MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs [79.45006864728893]
MeshCraft is a framework for efficient and controllable mesh generation.<n>It uses continuous spatial diffusion to generate discrete triangle faces.<n>It can generate an 800-face mesh in just 3.2 seconds.
arXiv Detail & Related papers (2025-03-29T09:21:50Z) - MeshXL: Neural Coordinate Field for Generative 3D Foundation Models [51.1972329762843]
We present a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches.
MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications.
arXiv Detail & Related papers (2024-05-31T14:35:35Z) - PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance [66.40153183581894]
We introduce a generic and scalable mesh generation framework PivotMesh.
PivotMesh makes an initial attempt to extend the native mesh generation to large-scale datasets.
We show that PivotMesh can generate compact and sharp 3D meshes across various categories.
arXiv Detail & Related papers (2024-05-27T07:13:13Z) - Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction [153.52406455209538]
Gamba is an end-to-end 3D reconstruction model from a single-view image.
It completes reconstruction within 0.05 seconds on a single NVIDIA A100 GPU.
arXiv Detail & Related papers (2024-03-27T17:40:14Z) - Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image.
Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z) - Detailed Avatar Recovery from Single Image [50.82102098057822]
This paper presents a novel framework to recover emphdetailed avatar from a single image.
We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation framework.
Our method can restore detailed human body shapes with complete textures beyond skinned models.
arXiv Detail & Related papers (2021-08-06T03:51:26Z) - End-to-End Human Pose and Mesh Reconstruction with Transformers [17.75480888764098]
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image.
METRO does not rely on any parametric mesh models like SMPL, thus it can be easily extended to other objects such as hands.
We demonstrate the generalizability of METRO to 3D hand reconstruction in the wild, outperforming existing state-of-the-art methods on FreiHAND dataset.
arXiv Detail & Related papers (2020-12-17T17:17:29Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.