Related papers: AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning

AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning

URL: http://arxiv.org/abs/2512.17853v1
Date: Fri, 19 Dec 2025 17:55:48 GMT
Title: AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning
Authors: Ran Gong, Xiaohan Zhang, Jinghuan Shang, Maria Vittoria Minniti, Jigarkumar Patel, Valerio Pepe, Riedana Yan, Ahmet Gundogdu, Ivan Kapelyukh, Ali Abbas, Xiaoqiang Yan, Harsh Patel, Laura Herlant, Karl Schmeckpeper,
Abstract summary: Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world.<n>We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks.<n>We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware.
Score: 16.837846476054786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .

Related papers

Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos [61.925837909969815]
We introduce Video2Policy, a novel framework that leverages internet RGB videos to reconstruct tasks based on everyday human behavior.<n>Our method can successfully train RL policies on such tasks, including complex and challenging tasks such as throwing.<n>We show that the generated simulation data can be scaled up for training a general policy, and it can be transferred back to the real robot in a Real2Sim2Real way.
arXiv Detail & Related papers (2025-02-14T03:22:03Z)
GRS: Generating Robotic Simulation Tasks from Real-World Images [21.599606995763036]
GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training.<n>We ensure simulation-task alignment through generated test suites and introduce a router that iteratively refines both simulation and test code.
arXiv Detail & Related papers (2024-10-20T23:33:06Z)
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation [51.20656279478878]
MATRIX is a multi-agent simulator that automatically generates diverse text-based scenarios.<n>We introduce MATRIX-Gen for controllable and highly realistic data synthesis.<n>On AlpacaEval 2 and Arena-Hard benchmarks, Llama-3-8B-Base, post-trained on datasets synthesized by MATRIX-Gen with just 20K instruction-response pairs, outperforms Meta's Llama-3-8B-Instruct model.
arXiv Detail & Related papers (2024-10-18T08:01:39Z)
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs [38.281562732050084]
GenSim2 is a scalable framework for complex and realistic simulation task creation. The pipeline can generate data for up to 100 articulated tasks with 200 objects and reduce the required human efforts. We show a promising usage of GenSim2 that the generated data can be used for zero-shot transfer or co-train with real-world collected data.
arXiv Detail & Related papers (2024-10-04T17:51:33Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
DrEureka: Language Model Guided Sim-To-Real Transfer [64.14314476811806]
Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball.
arXiv Detail & Related papers (2024-06-04T04:53:05Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models [17.757495961816783]
Gen2Sim is a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions. Our work contributes hundreds of simulated assets, tasks and demonstrations, taking a step towards fully autonomous robotic manipulation skill acquisition in simulation.
arXiv Detail & Related papers (2023-10-27T17:55:32Z)
GenSim: Generating Robotic Simulation Tasks via Large Language Models [34.79613485106202]
GenSim aims to automatically generate rich simulation environments and expert demonstrations. We use GPT4 to expand the existing benchmark by ten times to over 100 tasks. With minimal sim-to-real adaptation, multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world.
arXiv Detail & Related papers (2023-10-02T17:23:48Z)
VIMA: General Robot Manipulation with Multimodal Prompts [82.01214865117637]
We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts. We develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively.
arXiv Detail & Related papers (2022-10-06T17:50:11Z)
Reactive Long Horizon Task Execution via Visual Skill and Precondition Models [59.76233967614774]
We describe an approach for sim-to-real training that can accomplish unseen robotic tasks using models learned in simulation to ground components of a simple task planner. We show an increase in success rate from 91.6% to 98% in simulation and from 10% to 80% success rate in the real-world as compared with naive baselines.
arXiv Detail & Related papers (2020-11-17T15:24:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.