Related papers: Assembler: Scalable 3D Part Assembly via Anchor Point Diffusion

Assembler: Scalable 3D Part Assembly via Anchor Point Diffusion

URL: http://arxiv.org/abs/2506.17074v1
Date: Fri, 20 Jun 2025 15:25:20 GMT
Title: Assembler: Scalable 3D Part Assembly via Anchor Point Diffusion
Authors: Wang Zhao, Yan-Pei Cao, Jiale Xu, Yuejiang Dong, Ying Shan,
Abstract summary: We present Assembler, a scalable and generalizable framework for 3D part assembly.<n>It handles diverse, in-the-wild objects with varying part counts, geometries, and structures.<n>It achieves state-of-the-art performance on PartNet and is the first to demonstrate high-quality assembly for complex, real-world objects.
Score: 39.08891847512135
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Assembler, a scalable and generalizable framework for 3D part assembly that reconstructs complete objects from input part meshes and a reference image. Unlike prior approaches that mostly rely on deterministic part pose prediction and category-specific training, Assembler is designed to handle diverse, in-the-wild objects with varying part counts, geometries, and structures. It addresses the core challenges of scaling to general 3D part assembly through innovations in task formulation, representation, and data. First, Assembler casts part assembly as a generative problem and employs diffusion models to sample plausible configurations, effectively capturing ambiguities arising from symmetry, repeated parts, and multiple valid assemblies. Second, we introduce a novel shape-centric representation based on sparse anchor point clouds, enabling scalable generation in Euclidean space rather than SE(3) pose prediction. Third, we construct a large-scale dataset of over 320K diverse part-object assemblies using a synthesis and filtering pipeline built on existing 3D shape repositories. Assembler achieves state-of-the-art performance on PartNet and is the first to demonstrate high-quality assembly for complex, real-world objects. Based on Assembler, we further introduce an interesting part-aware 3D modeling system that generates high-resolution, editable objects from images, demonstrating potential for interactive and compositional design. Project page: https://assembler3d.github.io

Related papers

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers [29.52313100024294]
We introduce PartCrafter, the first structured 3D generative model that jointly synthesizes multiple semantically meaningful and geometrically distinct 3D meshes from a single RGB image.<n>PartCrafter simultaneously denoises multiple 3D parts, enabling end-to-end part-aware generation of both individual objects and complex multi-object scenes.<n> Experiments show that PartCrafter outperforms existing approaches in generating decomposable 3D meshes.
arXiv Detail & Related papers (2025-06-05T20:30:28Z)
Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations [112.29763628638112]
Object-X is a versatile multi-modal 3D representation framework.<n>It can encoding rich object embeddings and decoding them back into geometric and visual reconstructions.<n>It supports a range of downstream tasks, including scene alignment, single-image 3D object reconstruction, and localization.
arXiv Detail & Related papers (2025-06-05T09:14:42Z)
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models [63.1432721793683]
We introduce PartGen, a novel approach that generates 3D objects composed of meaningful parts starting from text, an image, or an unstructured 3D object.<n>We evaluate our method on generated and real 3D assets and show that it outperforms segmentation and part-extraction baselines by a large margin.
arXiv Detail & Related papers (2024-12-24T18:59:43Z)
3D Part Segmentation via Geometric Aggregation of 2D Visual Features [57.20161517451834]
Supervised 3D part segmentation models are tailored for a fixed set of objects and parts, limiting their transferability to open-set, real-world scenarios.<n>Recent works have explored vision-language models (VLMs) as a promising alternative, using multi-view rendering and textual prompting to identify object parts.<n>To address these limitations, we propose COPS, a COmprehensive model for Parts that blends semantics extracted from visual concepts and 3D geometry to effectively identify object parts.
arXiv Detail & Related papers (2024-12-05T15:27:58Z)
Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image. We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z)
Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images [24.10809783713574]
This paper introduces a novel task: translating multi-view images of a structural 3D model into a detailed sequence of assembly instructions. We propose an end-to-end model known as the Neural Assembler.
arXiv Detail & Related papers (2024-04-25T08:53:23Z)
Generative 3D Part Assembly via Dynamic Graph Learning [34.108515032411695]
Part assembly is a challenging yet crucial task in 3D computer vision and robotics. We propose an assembly-oriented dynamic graph learning framework that leverages an iterative graph neural network as a backbone.
arXiv Detail & Related papers (2020-06-14T04:26:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.