Related papers: ReorientDiff: Diffusion Model based Reorientation for Object Manipulation

ReorientDiff: Diffusion Model based Reorientation for Object Manipulation

URL: http://arxiv.org/abs/2303.12700v2
Date: Fri, 15 Sep 2023 03:14:03 GMT
Title: ReorientDiff: Diffusion Model based Reorientation for Object Manipulation
Authors: Utkarsh A. Mishra and Yongxin Chen
Abstract summary: The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. We propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach. The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 95.2% in simulation.
Score: 18.95498618397922
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. While certain goals can be achieved by picking and placing the objects of interest directly, object reorientation is needed for precise placement in most of the tasks. In such scenarios, the object must be reoriented and re-positioned into intermediate poses that facilitate accurate placement at the target pose. To this end, we propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach. The proposed method employs both visual inputs from the scene, and goal-specific language prompts to plan intermediate reorientation poses. Specifically, the scene and language-task information are mapped into a joint scene-task representation feature space, which is subsequently leveraged to condition the diffusion model. The diffusion model samples intermediate poses based on the representation using classifier-free guidance and then uses gradients of learned feasibility-score models for implicit iterative pose-refinement. The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 95.2% in simulation. Overall, our study presents a promising approach to address the reorientation challenge in manipulation by learning a conditional distribution, which is an effective way to move towards more generalizable object manipulation. For more results, checkout our website: https://utkarshmishra04.github.io/ReorientDiff.

Related papers

Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation [68.81887041766373]
We introduce a diffusion-based paradigm for domain-generalized 9-DoF object pose estimation. We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective. We show that our method achieves state-of-the-art domain generalization performance.
arXiv Detail & Related papers (2025-02-04T17:46:34Z)
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models [79.96917782423219]
Orient Anything is the first expert and foundational model designed to estimate object orientation in a single image. By developing a pipeline to annotate the front face of 3D objects, we collect 2M images with precise orientation annotations. Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images.
arXiv Detail & Related papers (2024-12-24T18:58:43Z)
Diffusion as Reasoning: Enhancing Object Goal Navigation with LLM-Biased Diffusion Model [9.939998139837426]
We propose a new approach to solving the ObjectNav task, by training a diffusion model to learn the statistical distribution patterns of objects in semantic maps. We also propose the global target bias and local LLM bias methods, where the former can constrain the diffusion model to generate the target object more effectively. Based on the generated map in the unknown region, the agent sets the predicted location of the target as the goal and moves towards it.
arXiv Detail & Related papers (2024-10-29T08:10:06Z)
Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations. We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action. Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z)
LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping [9.690844449175948]
We focus on object pose estimation. Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects. We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
TarGF: Learning Target Gradient Field for Object Rearrangement [8.49306925839127]
We focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution. It is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution.
arXiv Detail & Related papers (2022-09-02T07:20:34Z)
Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding [93.82542533426766]
We propose a Suspected Object Transformation mechanism (SOT) to encourage the target object selection among the suspected ones. SOT can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders. Extensive experiments demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2022-03-10T06:41:07Z)
Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning [51.74463056899926]
This work proposes an optimization-based manipulation planning framework where the objectives are learned functionals of signed-distance fields that represent objects in the scene. We show that representing objects as signed-distance fields not only enables to learn and represent a variety of models with higher accuracy compared to point-cloud and occupancy measure representations.
arXiv Detail & Related papers (2021-10-02T12:36:58Z)
Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z)
Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.