ReorientDiff: Diffusion Model based Reorientation for Object
Manipulation
- URL: http://arxiv.org/abs/2303.12700v2
- Date: Fri, 15 Sep 2023 03:14:03 GMT
- Title: ReorientDiff: Diffusion Model based Reorientation for Object
Manipulation
- Authors: Utkarsh A. Mishra and Yongxin Chen
- Abstract summary: The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications.
We propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach.
The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 95.2% in simulation.
- Score: 18.95498618397922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to manipulate objects in a desired configurations is a
fundamental requirement for robots to complete various practical applications.
While certain goals can be achieved by picking and placing the objects of
interest directly, object reorientation is needed for precise placement in most
of the tasks. In such scenarios, the object must be reoriented and
re-positioned into intermediate poses that facilitate accurate placement at the
target pose. To this end, we propose a reorientation planning method,
ReorientDiff, that utilizes a diffusion model-based approach. The proposed
method employs both visual inputs from the scene, and goal-specific language
prompts to plan intermediate reorientation poses. Specifically, the scene and
language-task information are mapped into a joint scene-task representation
feature space, which is subsequently leveraged to condition the diffusion
model. The diffusion model samples intermediate poses based on the
representation using classifier-free guidance and then uses gradients of
learned feasibility-score models for implicit iterative pose-refinement. The
proposed method is evaluated using a set of YCB-objects and a suction gripper,
demonstrating a success rate of 95.2% in simulation. Overall, our study
presents a promising approach to address the reorientation challenge in
manipulation by learning a conditional distribution, which is an effective way
to move towards more generalizable object manipulation. For more results,
checkout our website: https://utkarshmishra04.github.io/ReorientDiff.
Related papers
- Diffusion as Reasoning: Enhancing Object Goal Navigation with LLM-Biased Diffusion Model [9.939998139837426]
We propose a new approach to solving the ObjectNav task, by training a diffusion model to learn the statistical distribution patterns of objects in semantic maps.
We also propose the global target bias and local LLM bias methods, where the former can constrain the diffusion model to generate the target object more effectively.
Based on the generated map in the unknown region, the agent sets the predicted location of the target as the goal and moves towards it.
arXiv Detail & Related papers (2024-10-29T08:10:06Z) - Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations.
We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action.
Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - TarGF: Learning Target Gradient Field for Object Rearrangement [8.49306925839127]
We focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution.
It is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations.
We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution.
arXiv Detail & Related papers (2022-09-02T07:20:34Z) - Suspected Object Matters: Rethinking Model's Prediction for One-stage
Visual Grounding [93.82542533426766]
We propose a Suspected Object Transformation mechanism (SOT) to encourage the target object selection among the suspected ones.
SOT can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders.
Extensive experiments demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2022-03-10T06:41:07Z) - Learning Models as Functionals of Signed-Distance Fields for
Manipulation Planning [51.74463056899926]
This work proposes an optimization-based manipulation planning framework where the objectives are learned functionals of signed-distance fields that represent objects in the scene.
We show that representing objects as signed-distance fields not only enables to learn and represent a variety of models with higher accuracy compared to point-cloud and occupancy measure representations.
arXiv Detail & Related papers (2021-10-02T12:36:58Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.