SPAFormer: Sequential 3D Part Assembly with Transformers
- URL: http://arxiv.org/abs/2403.05874v2
- Date: Mon, 3 Jun 2024 07:37:23 GMT
- Title: SPAFormer: Sequential 3D Part Assembly with Transformers
- Authors: Boshen Xu, Sipeng Zheng, Qin Jin,
- Abstract summary: We introduce SPAFormer, an innovative model designed to overcome the explosion challenge in the 3D Part Assembly task.
It addresses this problem by leveraging constraints from assembly sequences, effectively reducing the solution space's complexity.
It further enhances assembly through knowledge enhancement strategies that utilize the attributes of parts and their sequence information.
- Score: 52.980803808373516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's pose and shape in sequential steps, and as the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy of 3D-PA. SPAFormer addresses this problem by leveraging weak constraints from assembly sequences, effectively reducing the solution space's complexity. Since assembly part sequences convey construction rules similar to sentences being structured through words, our model explores both parallel and autoregressive generation. It further enhances assembly through knowledge enhancement strategies that utilize the attributes of parts and their sequence information, enabling it to capture the inherent assembly pattern and relationships among sequentially ordered parts. We also construct a more challenging benchmark named PartNet-Assembly covering 21 varied categories to more comprehensively validate the effectiveness of SPAFormer. Extensive experiments demonstrate the superior generalization capabilities of SPAFormer, particularly with multi-tasking and in scenarios requiring long-horizon assembly. Codes and model weights will be released at https://github.com/xuboshen/SPAFormer.
Related papers
- Jigsaw++: Imagining Complete Shape Priors for Object Reassembly [35.16793557538698]
Jigsaw++ is a novel generative method designed to tackle the multifaceted challenges of reconstruction for the reassembly problem.
It distinguishes itself by learning a category-agnostic shape prior to complete objects.
J Jigsaw++ has demonstrated its effectiveness, reducing reconstruction errors and enhancing the precision of shape reconstruction.
arXiv Detail & Related papers (2024-10-15T17:45:37Z) - TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly [51.29305265324916]
We propose a class-agnostic tree-transformer framework to predict the sequential assembly actions from input multi-view images.
A major challenge of the sequential brick assembly task is that the step-wise action labels are costly and tedious to obtain in practice.
We mitigate this problem by leveraging synthetic-to-real transfer learning.
arXiv Detail & Related papers (2024-07-22T14:05:27Z) - Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective [26.479602180023125]
The Linear Complexity Sequence Model (LCSM) unites various sequence modeling techniques with linear complexity.
We segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink.
We perform experiments to analyze the impact of different stage settings on language modeling and retrieval tasks.
arXiv Detail & Related papers (2024-05-27T17:38:55Z) - Multi-level Reasoning for Robotic Assembly: From Sequence Inference to
Contact Selection [74.40109927350856]
We present the Part Assembly Sequence Transformer (PAST) to infer assembly sequences from a target blueprint.
We then use a motion planner and optimization to generate part movements and contacts.
Experimental results show that our approach generalizes better than prior methods.
arXiv Detail & Related papers (2023-12-17T00:47:13Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - Combinatorial 3D Shape Generation via Sequential Assembly [40.2815083025929]
Sequential assembly with geometric primitives has drawn attention in robotics and 3D vision since it yields a practical blueprint to construct a target shape.
We propose a 3D shape generation framework to alleviate this consequence induced by a huge number of feasible combinations.
Experimental results demonstrate that our method successfully generates 3D shapes and simulates more realistic generation processes.
arXiv Detail & Related papers (2020-04-16T01:23:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.