Related papers: SPAFormer: Sequential 3D Part Assembly with Transformers

SPAFormer: Sequential 3D Part Assembly with Transformers

URL: http://arxiv.org/abs/2403.05874v2
Date: Mon, 3 Jun 2024 07:37:23 GMT
Title: SPAFormer: Sequential 3D Part Assembly with Transformers
Authors: Boshen Xu, Sipeng Zheng, Qin Jin,
Abstract summary: We introduce SPAFormer, an innovative model designed to overcome the explosion challenge in the 3D Part Assembly task. It addresses this problem by leveraging constraints from assembly sequences, effectively reducing the solution space's complexity. It further enhances assembly through knowledge enhancement strategies that utilize the attributes of parts and their sequence information.
Score: 52.980803808373516
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's pose and shape in sequential steps, and as the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy of 3D-PA. SPAFormer addresses this problem by leveraging weak constraints from assembly sequences, effectively reducing the solution space's complexity. Since assembly part sequences convey construction rules similar to sentences being structured through words, our model explores both parallel and autoregressive generation. It further enhances assembly through knowledge enhancement strategies that utilize the attributes of parts and their sequence information, enabling it to capture the inherent assembly pattern and relationships among sequentially ordered parts. We also construct a more challenging benchmark named PartNet-Assembly covering 21 varied categories to more comprehensively validate the effectiveness of SPAFormer. Extensive experiments demonstrate the superior generalization capabilities of SPAFormer, particularly with multi-tasking and in scenarios requiring long-horizon assembly. Codes and model weights will be released at https://github.com/xuboshen/SPAFormer.

Related papers

SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting [85.87902260102652]
We introduce the novel task of Sequential 3D Gaussian Affordance Reasoning.<n>We then propose SeqSplatNet, an end-to-end framework that directly maps an instruction to a sequence of 3D affordance masks.<n>Our method sets a new state-of-the-art on our challenging benchmark, effectively advancing affordance reasoning from single-step interactions to complex, sequential tasks at the scene level.
arXiv Detail & Related papers (2025-07-31T17:56:55Z)
From One to More: Contextual Part Latents for 3D Generation [33.43336981984443]
CoPart is a part-aware diffusion framework that decomposes 3D objects into contextual part latents for coherent multi-part generation.<n>We construct a novel 3D part dataset derived from articulated mesh segmentation and human-verified annotations.<n>Experiments demonstrate CoPart's superior capabilities in part-level editing, object generation, and scene composition with unprecedented controllability.
arXiv Detail & Related papers (2025-07-11T17:33:18Z)
PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation [79.46526296655776]
PRISM is a novel approach for 3D shape generation that integrates categorical diffusion models with Statistical Shape Models (SSM) and Gaussian Mixture Models (GMM) Our method employs compositional SSMs to capture part-level geometric variations and uses GMM to represent part semantics in a continuous space. Our approach significantly outperforms previous methods in both quality and controllability of part-level operations.
arXiv Detail & Related papers (2025-04-06T11:48:08Z)
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs [15.118234858274679]
We propose Generative Visual Puzzles (GenVP) to model the entire RPM generation process. Our model's capability spans from generating multiple solutions for one specific problem prompt to creating complete new puzzles out of the desired set of rules.
arXiv Detail & Related papers (2025-03-30T21:35:26Z)
Jigsaw++: Imagining Complete Shape Priors for Object Reassembly [35.16793557538698]
Jigsaw++ is a novel generative method designed to tackle the multifaceted challenges of reconstruction for the reassembly problem. It distinguishes itself by learning a category-agnostic shape prior to complete objects. J Jigsaw++ has demonstrated its effectiveness, reducing reconstruction errors and enhancing the precision of shape reconstruction.
arXiv Detail & Related papers (2024-10-15T17:45:37Z)
TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly [51.29305265324916]
We propose a class-agnostic tree-transformer framework to predict the sequential assembly actions from input multi-view images. A major challenge of the sequential brick assembly task is that the step-wise action labels are costly and tedious to obtain in practice. We mitigate this problem by leveraging synthetic-to-real transfer learning.
arXiv Detail & Related papers (2024-07-22T14:05:27Z)
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles. Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query. Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z)
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective [26.479602180023125]
The Linear Complexity Sequence Model (LCSM) unites various sequence modeling techniques with linear complexity. We segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink. We perform experiments to analyze the impact of different stage settings on language modeling and retrieval tasks.
arXiv Detail & Related papers (2024-05-27T17:38:55Z)
Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection [74.40109927350856]
We present the Part Assembly Sequence Transformer (PAST) to infer assembly sequences from a target blueprint. We then use a motion planner and optimization to generate part movements and contacts. Experimental results show that our approach generalizes better than prior methods.
arXiv Detail & Related papers (2023-12-17T00:47:13Z)
Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries. Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z)
Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space. We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z)
HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation [22.648409352844997]
We propose Hierarchical Spatial-Temporal transFormers (HSTFormer) to capture multi-level joints' spatial-temporal correlations from local to global gradually for accurate 3D human pose estimation. HSTFormer consists of four transformer encoders (TEs) and a fusion module. To the best of our knowledge, HSTFormer is the first to study hierarchical TEs with multi-level fusion. It surpasses recent SOTAs on the challenging MPI-INF-3DHP dataset and small-scale HumanEva dataset, with a highly generalized systematic approach.
arXiv Detail & Related papers (2023-01-18T05:54:02Z)
Combinatorial 3D Shape Generation via Sequential Assembly [40.2815083025929]
Sequential assembly with geometric primitives has drawn attention in robotics and 3D vision since it yields a practical blueprint to construct a target shape. We propose a 3D shape generation framework to alleviate this consequence induced by a huge number of feasible combinations. Experimental results demonstrate that our method successfully generates 3D shapes and simulates more realistic generation processes.
arXiv Detail & Related papers (2020-04-16T01:23:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.