Related papers: Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

URL: http://arxiv.org/abs/2409.15637v1
Date: Tue, 24 Sep 2024 00:51:45 GMT
Title: Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale
Authors: Tianyue Ou, Frank F. Xu, Aman Madaan, Jiarui Liu, Robert Lo, Abishek Sridhar, Sudipta Sengupta, Dan Roth, Graham Neubig, Shuyan Zhou,
Abstract summary: LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives. accuracy is still far from satisfactory, partly due to a lack of large-scale, direct demonstrations for digital tasks. We present Synatra, an approach that effectively transforms this indirect knowledge into direct supervision at scale.
Score: 97.21851531607811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives (e.g., arranging an online meeting). However, accuracy is still far from satisfactory, partly due to a lack of large-scale, direct demonstrations for digital tasks. Obtaining supervised data from humans is costly, and automatic data collection through exploration or reinforcement learning relies on complex environmental and content setup, resulting in datasets that lack comprehensive coverage of various scenarios. On the other hand, there is abundant knowledge that may indirectly assist task completion, such as online tutorials that were created for human consumption. In this work, we present Synatra, an approach that effectively transforms this indirect knowledge into direct supervision at scale. We define different types of indirect knowledge, and carefully study the available sources to obtain it, methods to encode the structure of direct demonstrations, and finally methods to transform indirect knowledge into direct demonstrations. We use 100k such synthetically-created demonstrations to finetune a 7B CodeLlama, and demonstrate that the resulting agent surpasses all comparably sized models on three web-based task benchmarks Mind2Web, MiniWoB++ and WebArena, as well as surpassing GPT-3.5 on WebArena and Mind2Web. In addition, while synthetic demonstrations prove to be only 3% the cost of human demonstrations (at $0.031 each), we show that the synthetic demonstrations can be more effective than an identical number of human demonstrations collected from limited domains.

Related papers

ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding [31.481969919049472]
ActionArt is a fine-grained video-caption dataset designed to advance research in human-centric multimodal understanding. Our dataset comprises thousands of videos capturing a broad spectrum of human actions, human-object interactions, and diverse scenarios. We develop eight sub-tasks to evaluate the fine-grained understanding capabilities of existing large multimodal models across different dimensions.
arXiv Detail & Related papers (2025-04-25T08:05:32Z)
Imitation Learning with Precisely Labeled Human Demonstrations [0.0]
This work builds on prior studies that demonstrate the viability of using hand-held grippers for efficient data collection. We leverage the user's control over the gripper's appearance--specifically by assigning it a unique, easily segmentable color--to enable precise end-effector pose estimation. We show in simulation that precisely labeled human demonstrations on their own allow policies to reach on average 88.1% of the performance of using robot demonstrations.
arXiv Detail & Related papers (2025-04-18T17:12:00Z)
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
Existing approaches rely on expensive human annotation, making them unsustainable at scale. We propose AgentTrek, a scalable data synthesis pipeline that generates web agent trajectories by leveraging publicly available tutorials. Our fully automated approach significantly reduces data collection costs, achieving a cost of just $0.55 per high-quality trajectory without human annotators.
arXiv Detail & Related papers (2024-12-12T18:59:27Z)
Information-driven Affordance Discovery for Efficient Robotic Manipulation [14.863105174430087]
We argue that well-directed interactions with the environment can mitigate this problem. We provide a theoretical justification of our approach and we empirically validate the approach both in simulation and real-world tasks. Our method, which we dub IDA, enables the efficient discovery of visual affordances for several action primitives.
arXiv Detail & Related papers (2024-05-06T21:25:51Z)
AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent [75.91274222142079]
In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents. AdaDemo is a framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset.
arXiv Detail & Related papers (2024-04-11T01:59:29Z)
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations [55.549956643032836]
MimicGen is a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations. We show that robot agents can be effectively trained on this generated dataset by imitation learning to achieve strong performance in long-horizon and high-precision tasks.
arXiv Detail & Related papers (2023-10-26T17:17:31Z)
Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2) The dataset is composed of both real and synthetic videos from seven gesture classes. We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z)
What Stops Learning-based 3D Registration from Working in the Real World? [53.68326201131434]
This work identifies the sources of 3D point cloud registration failures, analyze the reasons behind them, and propose solutions. Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data. Our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.
arXiv Detail & Related papers (2021-11-19T19:24:27Z)
Learning Feasibility to Imitate Demonstrators with Different Dynamics [23.239058855103067]
The goal of learning from demonstrations is to learn a policy for an agent (imitator) by mimicking the behavior in the demonstrations. We learn a feasibility metric that captures the likelihood of a demonstration being feasible by the imitator. Our experiments on four simulated environments and on a real robot show that the policy learned with our approach achieves a higher expected return than prior works.
arXiv Detail & Related papers (2021-10-28T14:15:47Z)
Learning Object Manipulation Skills via Approximate State Estimation from Real Videos [47.958512470724926]
Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In this paper, we explore a method that facilitates learning object manipulation skills directly from videos.
arXiv Detail & Related papers (2020-11-13T08:53:47Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.