Related papers: SCOOP'D: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy

SCOOP'D: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy

URL: http://arxiv.org/abs/2510.11566v1
Date: Mon, 13 Oct 2025 16:11:34 GMT
Title: SCOOP'D: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy
Authors: Kuanning Wang, Yongchong Gu, Yuqian Fu, Zeyu Shangguan, Sicheng He, Xiangyang Xue, Yanwei Fu, Daniel Seita,
Abstract summary: We propose SCOOP'D, which uses simulation from OmniGibson (built on NVIDIA Omniverse) to collect scooping demonstrations.<n>We apply the learned policy in diverse real-world scenarios, testing its performance on various item quantities, item characteristics, and container types.<n>SCOOP'D outperforms all baselines and ablations, suggesting that this is a promising approach to acquiring robotic scooping skills.
Score: 51.46611106470501
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scooping items with tools such as spoons and ladles is common in daily life, ranging from assistive feeding to retrieving items from environmental disaster sites. However, developing a general and autonomous robotic scooping policy is challenging since it requires reasoning about complex tool-object interactions. Furthermore, scooping often involves manipulating deformable objects, such as granular media or liquids, which is challenging due to their infinite-dimensional configuration spaces and complex dynamics. We propose a method, SCOOP'D, which uses simulation from OmniGibson (built on NVIDIA Omniverse) to collect scooping demonstrations using algorithmic procedures that rely on privileged state information. Then, we use generative policies via diffusion to imitate demonstrations from observational input. We directly apply the learned policy in diverse real-world scenarios, testing its performance on various item quantities, item characteristics, and container types. In zero-shot deployment, our method demonstrates promising results across 465 trials in diverse scenarios, including objects of different difficulty levels that we categorize as "Level 1" and "Level 2." SCOOP'D outperforms all baselines and ablations, suggesting that this is a promising approach to acquiring robotic scooping skills. Project page is at https://scoopdiff.github.io/.

Related papers

Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation [90.90219129619344]
This paper presents a novel R-prior-S, Recurrent Geometric-priormodal Policy with Spiking features.<n>To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases.<n>For the data efficiency issue in robotic action generation, we introduce a Recursive Adaptive Spiking Network.
arXiv Detail & Related papers (2026-01-13T23:36:30Z)
HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies [83.41714103649751]
Development of embodied intelligence models depends on access to high-quality robot demonstration data.<n>We present HiMoE-VLA, a novel vision-language-action framework tailored to handle diverse robotic data with heterogeneity.<n>HiMoE-VLA demonstrates a consistent performance boost over existing VLA baselines, achieving higher accuracy and robust generalizations.
arXiv Detail & Related papers (2025-12-05T13:21:05Z)
Learning to Grasp Anything by Playing with Random Toys [65.47078295823074]
We show that robots can learn generalizable grasping using randomly assembled objects.<n>We find the key to this generalization is an object-centric visual representation induced by our proposed detection pooling mechanism.<n>We believe this work offers a promising path to scalable and generalizable learning in robotic manipulation.
arXiv Detail & Related papers (2025-10-14T17:56:10Z)
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation [22.43711565969091]
Articubot is a system that learns a policy to open diverse categories of unseen articulated objects in the real world.<n>We show that our learned policy can zero-shot transfer to three different real robot settings.
arXiv Detail & Related papers (2025-03-04T22:51:50Z)
DITTO: Demonstration Imitation by Trajectory Transformation [31.930923345163087]
In this work, we address the problem of one-shot imitation from a single human demonstration, given by an RGB-D video recording. We propose a two-stage process. In the first stage we extract the demonstration trajectory offline. This entails segmenting manipulated objects and determining their relative motion in relation to secondary objects such as containers. In the online trajectory generation stage, we first re-detect all objects, then warp the demonstration trajectory to the current scene and execute it on the robot.
arXiv Detail & Related papers (2024-03-22T13:46:51Z)
Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects [18.342569823885864]
Teacher-Augmented Policy Gradient (TAPG) is a novel two-stage learning framework that synergizes reinforcement learning and policy distillation. TAPG facilitates guided, yet adaptive, learning of a sensorimotor policy, based on object segmentation. Our trained policies adeptly grasp a wide variety of objects from cluttered scenarios in simulation and the real world based on human-understandable prompts.
arXiv Detail & Related papers (2024-03-15T10:48:16Z)
One-shot Imitation Learning via Interaction Warping [32.5466340846254]
We propose a new method, Interaction Warping, for learning SE(3) robotic manipulation policies from a single demonstration. We infer the 3D mesh of each object in the environment using shape warping, a technique for aligning point clouds across object instances. We show successful one-shot imitation learning on three simulated and real-world object re-arrangement tasks.
arXiv Detail & Related papers (2023-06-21T17:26:11Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Learning Generalizable Dexterous Manipulation from Human Grasp Affordance [11.060931225148936]
Dexterous manipulation with a multi-finger hand is one of the most challenging problems in robotics. Recent progress in imitation learning has largely improved the sample efficiency compared to Reinforcement Learning. We propose to learn dexterous manipulation using large-scale demonstrations with diverse 3D objects in a category.
arXiv Detail & Related papers (2022-04-05T16:26:22Z)
Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function. We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)
Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.