SPAFormer: Sequential 3D Part Assembly with Transformers
- URL: http://arxiv.org/abs/2403.05874v3
- Date: Sun, 09 Feb 2025 11:56:08 GMT
- Title: SPAFormer: Sequential 3D Part Assembly with Transformers
- Authors: Boshen Xu, Sipeng Zheng, Qin Jin,
- Abstract summary: We introduce SPAFormer, an innovative model designed to overcome the sequential explosion challenge in the 3D Part Assembly (3D-PA) task.
As the number of parts increases, the possible assembly combinations increase exponentially, leading to an explosion that severely hinders the efficacy of 3D-PA.
Since the sequence of parts conveys construction rules similar to sentences structured through words, our model explores both parallel and autore generation.
- Score: 52.980803808373516
- License:
- Abstract: We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's poses in sequential steps. As the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy of 3D-PA. SPAFormer addresses this problem by leveraging weak constraints from assembly sequences, effectively reducing the solution space's complexity. Since the sequence of parts conveys construction rules similar to sentences structured through words, our model explores both parallel and autoregressive generation. We further strengthen SPAFormer through knowledge enhancement strategies that utilize the attributes of parts and their sequence information, enabling it to capture the inherent assembly pattern and relationships among sequentially ordered parts. We also construct a more challenging benchmark named PartNet-Assembly covering 21 varied categories to more comprehensively validate the effectiveness of SPAFormer. Extensive experiments demonstrate the superior generalization capabilities of SPAFormer, particularly with multi-tasking and in scenarios requiring long-horizon assembly. Code is available at https://github.com/xuboshen/SPAFormer.
Related papers
- Jigsaw++: Imagining Complete Shape Priors for Object Reassembly [35.16793557538698]
Jigsaw++ is a novel generative method designed to tackle the multifaceted challenges of reconstruction for the reassembly problem.
It distinguishes itself by learning a category-agnostic shape prior to complete objects.
J Jigsaw++ has demonstrated its effectiveness, reducing reconstruction errors and enhancing the precision of shape reconstruction.
arXiv Detail & Related papers (2024-10-15T17:45:37Z) - Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective [26.479602180023125]
The Linear Complexity Sequence Model (LCSM) unites various sequence modeling techniques with linear complexity.
We segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink.
We perform experiments to analyze the impact of different stage settings on language modeling and retrieval tasks.
arXiv Detail & Related papers (2024-05-27T17:38:55Z) - ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - Multi-level Reasoning for Robotic Assembly: From Sequence Inference to
Contact Selection [74.40109927350856]
We present the Part Assembly Sequence Transformer (PAST) to infer assembly sequences from a target blueprint.
We then use a motion planner and optimization to generate part movements and contacts.
Experimental results show that our approach generalizes better than prior methods.
arXiv Detail & Related papers (2023-12-17T00:47:13Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose
Estimation [22.648409352844997]
We propose Hierarchical Spatial-Temporal transFormers (HSTFormer) to capture multi-level joints' spatial-temporal correlations from local to global gradually for accurate 3D human pose estimation.
HSTFormer consists of four transformer encoders (TEs) and a fusion module. To the best of our knowledge, HSTFormer is the first to study hierarchical TEs with multi-level fusion.
It surpasses recent SOTAs on the challenging MPI-INF-3DHP dataset and small-scale HumanEva dataset, with a highly generalized systematic approach.
arXiv Detail & Related papers (2023-01-18T05:54:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.