Part Segmentation of Human Meshes via Multi-View Human Parsing
- URL: http://arxiv.org/abs/2507.18655v2
- Date: Mon, 28 Jul 2025 01:06:32 GMT
- Title: Part Segmentation of Human Meshes via Multi-View Human Parsing
- Authors: James Dickens, Kamyar Hamad,
- Abstract summary: In parallel, the field of human parsing focuses on predicting body part and clothing/accessory labels from images.<n>This work aims to bridge these two domains by enabling per-vertex semantic segmentation of large-scale human meshes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in point cloud deep learning have led to models that achieve high per-part labeling accuracy on large-scale point clouds, using only the raw geometry of unordered point sets. In parallel, the field of human parsing focuses on predicting body part and clothing/accessory labels from images. This work aims to bridge these two domains by enabling per-vertex semantic segmentation of large-scale human meshes. To achieve this, a pseudo-ground truth labeling pipeline is developed for the Thuman2.1 dataset: meshes are first aligned to a canonical pose, segmented from multiple viewpoints, and the resulting point-level labels are then backprojected onto the original mesh to produce per-point pseudo ground truth annotations. Subsequently, a novel, memory-efficient sampling strategy is introduced, a windowed iterative farthest point sampling (FPS) with space-filling curve-based serialization to effectively downsample the point clouds. This is followed by a purely geometric segmentation using PointTransformer, enabling semantic parsing of human meshes without relying on texture information. Experimental results confirm the effectiveness and accuracy of the proposed approach.
Related papers
- High-quality Pseudo-labeling for Point Cloud Segmentation with Scene-level Annotation [32.03087826213936]
This paper investigates indoor point cloud semantic segmentation under scene-level annotation.<n>Current methods first generate point-level pseudo-labels, which are then used to train segmentation models.<n>To enhance accuracy, this paper proposes a high-quality pseudo-label generation framework.
arXiv Detail & Related papers (2025-06-29T13:17:12Z) - You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping [119.41166438439313]
YOEO is a single-stage method that outputs instance segmentation and NPCS representations in an end-to-end manner.<n>We use a unified network to generate point-wise semantic labels and centroid offsets, allowing points from the same part instance to vote for the same centroid.<n>We also deploy our synthetically-trained model in a real-world setting, providing real-time visual feedback at 200Hz.
arXiv Detail & Related papers (2025-06-06T03:49:20Z) - Robust Human Registration with Body Part Segmentation on Noisy Point Clouds [73.00876572870787]
We introduce a hybrid approach that incorporates body-part segmentation into the mesh fitting process.<n>Our method first assigns body part labels to individual points, which then guide a two-step SMPL-X fitting.<n>We demonstrate that the fitted human mesh can refine body part labels, leading to improved segmentation.
arXiv Detail & Related papers (2025-04-04T17:17:33Z) - Compositional Semantic Mix for Domain Adaptation in Point Cloud
Segmentation [65.78246406460305]
compositional semantic mixing represents the first unsupervised domain adaptation technique for point cloud segmentation.
We present a two-branch symmetric network architecture capable of concurrently processing point clouds from a source domain (e.g. synthetic) and point clouds from a target domain (e.g. real-world)
arXiv Detail & Related papers (2023-08-28T14:43:36Z) - Deep Semantic Graph Matching for Large-scale Outdoor Point Clouds
Registration [22.308070598885532]
We treat the point cloud registration problem as a semantic instance matching and registration task.
We propose a deep semantic graph matching method (DeepSGM) for large-scale outdoor point cloud registration.
Experimental results conducted on the KITTI Odometry dataset demonstrate that the proposed method improves the registration performance.
arXiv Detail & Related papers (2023-08-10T03:07:28Z) - GeoMAE: Masked Geometric Target Prediction for Self-supervised Point
Cloud Pre-Training [16.825524577372473]
We introduce a point cloud representation learning framework, based on geometric feature reconstruction.
We identify three self-supervised learning objectives to peculiar point clouds, namely centroid prediction, normal estimation, and curvature prediction.
Our pipeline is conceptually simple and it consists of two major steps: first, it randomly masks out groups of points, followed by a Transformer-based point cloud encoder.
arXiv Detail & Related papers (2023-05-15T17:14:55Z) - Position-Guided Point Cloud Panoptic Segmentation Transformer [118.17651196656178]
This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline.
We observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain.
The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% on Semantic KITTI and nuScenes benchmark, respectively.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Semantic Segmentation of Urban Textured Meshes Through Point Sampling [0.0]
We study the influence of different parameters such as the sampling method, the density of the extracted cloud, the features selected and the number of points used at each training period.
Our result outperforms the state-of-the-art on the SUM dataset, earning about 4 points in OA and 18 points in mIoU.
arXiv Detail & Related papers (2023-02-21T12:49:31Z) - Learning Semantic Segmentation of Large-Scale Point Clouds with Random
Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds.
The key to our approach is to use random point sampling instead of more complex point selection approaches.
Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.