Related papers: X-Ray: Mechanical Search for an Occluded Object by Minimizing Support of Learned Occupancy Distributions

X-Ray: Mechanical Search for an Occluded Object by Minimizing Support of Learned Occupancy Distributions

URL: http://arxiv.org/abs/2004.09039v2
Date: Sat, 10 Oct 2020 19:44:44 GMT
Title: X-Ray: Mechanical Search for an Occluded Object by Minimizing Support of Learned Occupancy Distributions
Authors: Michael Danielczuk, Anelia Angelova, Vincent Vanhoucke, Ken Goldberg
Abstract summary: We introduce X-Ray, an algorithm based on learned occupancy distributions. X-Ray minimizes support of the learned distribution as part of a mechanical search policy in both simulated and real environments. Results suggest that X-Ray is significantly more efficient, as it succeeds in extracting the target object 82% of the time.
Score: 44.39286120613235
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For applications in e-commerce, warehouses, healthcare, and home service, robots are often required to search through heaps of objects to grasp a specific target object. For mechanical search, we introduce X-Ray, an algorithm based on learned occupancy distributions. We train a neural network using a synthetic dataset of RGBD heap images labeled for a set of standard bounding box targets with varying aspect ratios. X-Ray minimizes support of the learned distribution as part of a mechanical search policy in both simulated and real environments. We benchmark these policies against two baseline policies on 1,000 heaps of 15 objects in simulation where the target object is partially or fully occluded. Results suggest that X-Ray is significantly more efficient, as it succeeds in extracting the target object 82% of the time, 15% more often than the best-performing baseline. Experiments on an ABB YuMi robot with 20 heaps of 25 household objects suggest that the learned policy transfers easily to a physical system, where it outperforms baseline policies by 15% in success rate with 17% fewer actions. Datasets, videos, and experiments are available at https://sites.google.com/berkeley.edu/x-ray.

Related papers

Superpowering Open-Vocabulary Object Detectors for X-ray Vision [53.07098133237041]
Open-vocabulary object detection (OvOD) is set to revolutionize security screening by enabling systems to recognize any item in X-ray scans. We propose RAXO, a framework that repurposes off-the-shelf RGB OvOD detectors for robust X-ray detection. RAXO builds high-quality X-ray class descriptors using a dual-source retrieval strategy.
arXiv Detail & Related papers (2025-03-21T11:54:16Z)
FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation [9.292150395779332]
We propose a novel framework for learning object-centric manipulation policies in force space. Our method simplifies the action space, reduces unnecessary exploration, and decreases simulation overhead. Our evaluations demonstrate that the method significantly outperforms baselines.
arXiv Detail & Related papers (2025-03-17T17:49:47Z)
AIvaluateXR: An Evaluation Framework for on-Device AI in XR with Benchmarking Results [55.33807002543901]
We present AIvaluateXR, a comprehensive evaluation framework for benchmarking large language models (LLMs) running on XR devices.<n>We deploy 17 selected LLMs across four XR platforms: Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro, and conduct an extensive evaluation.<n>We propose a unified evaluation method based on the 3D Optimality theory to select the optimal device-model pairs from quality and speed objectives.
arXiv Detail & Related papers (2025-02-13T20:55:48Z)
OptiGrasp: Optimized Grasp Pose Detection Using RGB Images for Warehouse Picking Robots [27.586777997464644]
In warehouse environments, robots require robust picking capabilities to manage a wide variety of objects. We propose an innovative approach that leverages foundation models to enhance suction grasping using only RGB images. Our network achieves an 82.3% success rate in real-world applications.
arXiv Detail & Related papers (2024-09-29T00:20:52Z)
Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z)
HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data. We show how this hierarchical decomposition makes our method robust to noise in the estimated value function. Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z)
RREx-BoT: Remote Referring Expressions with a Bag of Tricks [19.036557405184656]
We show how a vision-language scoring model can be used to locate objects in unobserved environments. We demonstrate our model on a real-world TurtleBot platform, highlighting the simplicity and usefulness of the approach. Our analysis outlines a "bag of tricks" essential for accomplishing this task, from utilizing 3d coordinates and context, to generalizing vision-language models to large 3d search spaces.
arXiv Detail & Related papers (2023-01-30T02:19:19Z)
Object Detection Using Sim2Real Domain Randomization for Robotic Applications [0.0]
We propose a sim2real transfer learning method based on domain randomization for object detection. A state-of-the-art convolutional neural network, YOLOv4, is trained to detect the different types of industrial objects. Our solution matches industrial needs as it can reliably differentiate similar classes of objects by using only 1 real image for training.
arXiv Detail & Related papers (2022-08-08T14:16:45Z)
Mechanical Search on Shelves using a Novel "Bluction" Tool [39.44966150696158]
Storage efficiency comes at the cost of reduced visibility and accessibility. We introduce a novel bluction tool, which combines a thin pushing blade and suction cup gripper. Using suction grasping actions improves the success rate over the highest performing push-only policy by 26% in simulation and 67% in physical environments.
arXiv Detail & Related papers (2022-01-22T05:47:30Z)
MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z)
Few-shot Weakly-Supervised Object Detection via Directional Statistics [55.97230224399744]
We propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD) Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps. Our experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks.
arXiv Detail & Related papers (2021-03-25T22:34:16Z)
Accelerating Grasp Exploration by Leveraging Learned Priors [24.94895421569869]
The ability of robots to grasp novel objects has industry applications in e-commerce order fulfillment and home service. We present a Thompson sampling algorithm that learns to grasp a given object with unknown geometry using online experience.
arXiv Detail & Related papers (2020-11-11T09:42:56Z)
POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments [89.43830036483901]
We focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent and a RGB-D frame. We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1.
arXiv Detail & Related papers (2020-09-17T08:23:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.