Related papers: Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection

Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection

URL: http://arxiv.org/abs/2312.10571v1
Date: Sun, 17 Dec 2023 00:47:13 GMT
Title: Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection
Authors: Xinghao Zhu, Devesh K. Jha, Diego Romeres, Lingfeng Sun, Masayoshi Tomizuka, Anoop Cherian
Abstract summary: We present the Part Assembly Sequence Transformer (PAST) to infer assembly sequences from a target blueprint. We then use a motion planner and optimization to generate part movements and contacts. Experimental results show that our approach generalizes better than prior methods.
Score: 74.40109927350856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automating the assembly of objects from their parts is a complex problem with innumerable applications in manufacturing, maintenance, and recycling. Unlike existing research, which is limited to target segmentation, pose regression, or using fixed target blueprints, our work presents a holistic multi-level framework for part assembly planning consisting of part assembly sequence inference, part motion planning, and robot contact optimization. We present the Part Assembly Sequence Transformer (PAST) -- a sequence-to-sequence neural network -- to infer assembly sequences recursively from a target blueprint. We then use a motion planner and optimization to generate part movements and contacts. To train PAST, we introduce D4PAS: a large-scale Dataset for Part Assembly Sequences (D4PAS) consisting of physically valid sequences for industrial objects. Experimental results show that our approach generalizes better than prior methods while needing significantly less computational time for inference.

Related papers

RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation [80.20970723577818]
We introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation.<n>The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences.<n>Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations.
arXiv Detail & Related papers (2025-06-07T06:15:49Z)
You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping [119.41166438439313]
YOEO is a single-stage method that outputs instance segmentation and NPCS representations in an end-to-end manner.<n>We use a unified network to generate point-wise semantic labels and centroid offsets, allowing points from the same part instance to vote for the same centroid.<n>We also deploy our synthetically-trained model in a real-world setting, providing real-time visual feedback at 200Hz.
arXiv Detail & Related papers (2025-06-06T03:49:20Z)
Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z)
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation [62.58480650443393]
Segment Anything (SAM) is a vision-foundation model for generalizable scene understanding and sequence imitation. We develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass.
arXiv Detail & Related papers (2024-05-30T00:32:51Z)
SPAFormer: Sequential 3D Part Assembly with Transformers [52.980803808373516]
We introduce SPAFormer, an innovative model designed to overcome the explosion challenge in the 3D Part Assembly task. It addresses this problem by leveraging constraints from assembly sequences, effectively reducing the solution space's complexity. It further enhances assembly through knowledge enhancement strategies that utilize the attributes of parts and their sequence information.
arXiv Detail & Related papers (2024-03-09T10:53:11Z)
ASAP: Automated Sequence Planning for Complex Robotic Assembly with Physical Feasibility [27.424678100675163]
We present ASAP, a physics-based planning approach for automatically generating a sequence for general-shaped assemblies. A search can be guided by either geometrics or graph neural networks trained on data with simulation labels. We show the superior performance of ASAP at generating physically realistic assembly sequence plans on a large dataset of hundreds of complex product assemblies.
arXiv Detail & Related papers (2023-09-29T00:27:40Z)
ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation [62.98078087018469]
We introduce MSDeAOT, a variant of the AOT framework that incorporates transformers at multiple feature scales. MSDeAOT efficiently propagates object masks from previous frames to the current frame using a feature scale with a stride of 16. We also employ GPM in a more refined feature scale with a stride of 8, leading to improved accuracy in detecting and tracking small objects.
arXiv Detail & Related papers (2023-07-05T03:43:15Z)
RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration [73.69415797389195]
We propose an end-to-end transformer network (RegFormer) for large-scale point cloud alignment. Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers. Our transformer has linear complexity, which guarantees high efficiency even for large-scale scenes.
arXiv Detail & Related papers (2023-03-22T08:47:37Z)
Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning [22.447462847331312]
We propose a holistic graphical approach including a graph representation called Assembly Graph for product assemblies. With GRACE, we are able to extract meaningful information from the graph input and predict assembly sequences in a step-by-step manner. In experiments, we show that our approach can predict feasible assembly sequences across product variants of aluminum profiles.
arXiv Detail & Related papers (2023-03-17T17:23:14Z)
3D Part Assembly Generation with Instance Encoded Transformer [22.330218525999857]
We propose a multi-layer transformer-based framework that involves geometric and relational reasoning between parts to update the part poses iteratively. We extend our framework to a new task called in-process part assembly. Our method achieves far more than 10% improvements over the current state-of-the-art in multiple metrics on the public PartNet dataset.
arXiv Detail & Related papers (2022-07-05T02:40:57Z)
Efficient and Robust Training of Dense Object Nets for Multi-Object Robot Manipulation [8.321536457963655]
We propose a framework for robust and efficient training of Dense Object Nets (DON) We focus on training with multi-object data instead of singulated objects, combined with a well-chosen augmentation scheme. We demonstrate the robustness and accuracy of our proposed framework on a real-world robotic grasping task.
arXiv Detail & Related papers (2022-06-24T08:24:42Z)
Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery [34.25379651790627]
We tackle the problem of building arbitrary, predefined target structures entirely from scratch using a set of Tetris-like building blocks and a robotic manipulator. Our novel hierarchical approach aims at efficiently decomposing the overall task into three feasible levels that benefit mutually from each other.
arXiv Detail & Related papers (2022-03-08T14:44:51Z)
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online. PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.