ReorientDiff: Diffusion Model based Reorientation for Object
Manipulation
- URL: http://arxiv.org/abs/2303.12700v2
- Date: Fri, 15 Sep 2023 03:14:03 GMT
- Title: ReorientDiff: Diffusion Model based Reorientation for Object
Manipulation
- Authors: Utkarsh A. Mishra and Yongxin Chen
- Abstract summary: The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications.
We propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach.
The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 95.2% in simulation.
- Score: 18.95498618397922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to manipulate objects in a desired configurations is a
fundamental requirement for robots to complete various practical applications.
While certain goals can be achieved by picking and placing the objects of
interest directly, object reorientation is needed for precise placement in most
of the tasks. In such scenarios, the object must be reoriented and
re-positioned into intermediate poses that facilitate accurate placement at the
target pose. To this end, we propose a reorientation planning method,
ReorientDiff, that utilizes a diffusion model-based approach. The proposed
method employs both visual inputs from the scene, and goal-specific language
prompts to plan intermediate reorientation poses. Specifically, the scene and
language-task information are mapped into a joint scene-task representation
feature space, which is subsequently leveraged to condition the diffusion
model. The diffusion model samples intermediate poses based on the
representation using classifier-free guidance and then uses gradients of
learned feasibility-score models for implicit iterative pose-refinement. The
proposed method is evaluated using a set of YCB-objects and a suction gripper,
demonstrating a success rate of 95.2% in simulation. Overall, our study
presents a promising approach to address the reorientation challenge in
manipulation by learning a conditional distribution, which is an effective way
to move towards more generalizable object manipulation. For more results,
checkout our website: https://utkarshmishra04.github.io/ReorientDiff.
Related papers
- Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation [68.81887041766373]
We introduce a diffusion-based paradigm for domain-generalized 9-DoF object pose estimation.
We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective.
We show that our method achieves state-of-the-art domain generalization performance.
arXiv Detail & Related papers (2025-02-04T17:46:34Z) - Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models [79.96917782423219]
Orient Anything is the first expert and foundational model designed to estimate object orientation in a single image.
By developing a pipeline to annotate the front face of 3D objects, we collect 2M images with precise orientation annotations.
Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images.
arXiv Detail & Related papers (2024-12-24T18:58:43Z) - Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations.
We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action.
Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - TarGF: Learning Target Gradient Field for Object Rearrangement [8.49306925839127]
We focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution.
It is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations.
We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution.
arXiv Detail & Related papers (2022-09-02T07:20:34Z) - Suspected Object Matters: Rethinking Model's Prediction for One-stage
Visual Grounding [93.82542533426766]
We propose a Suspected Object Transformation mechanism (SOT) to encourage the target object selection among the suspected ones.
SOT can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders.
Extensive experiments demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2022-03-10T06:41:07Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z) - Point-Set Anchors for Object Detection, Instance Segmentation and Pose
Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries.
To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.