Learning Goals from Failure
- URL: http://arxiv.org/abs/2006.15657v2
- Date: Sun, 13 Dec 2020 01:44:08 GMT
- Title: Learning Goals from Failure
- Authors: Dave Epstein and Carl Vondrick
- Abstract summary: We introduce a framework that predicts the goals behind observable human action in video.
Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of goals without direct supervision.
- Score: 30.071336708348472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a framework that predicts the goals behind observable human
action in video. Motivated by evidence in developmental psychology, we leverage
video of unintentional action to learn video representations of goals without
direct supervision. Our approach models videos as contextual trajectories that
represent both low-level motion and high-level action features. Experiments and
visualizations show our trained model is able to predict the underlying goals
in video of unintentional action. We also propose a method to "automatically
correct" unintentional action by leveraging gradient signals of our model to
adjust latent trajectories. Although the model is trained with minimal
supervision, it is competitive with or outperforms baselines trained on large
(supervised) datasets of successfully executed goals, showing that observing
unintentional action is crucial to learning about goals in video. Project page:
https://aha.cs.columbia.edu/
Related papers
- Grounding Video Models to Actions through Goal Conditioned Exploration [29.050431676226115]
We propose a framework that uses trajectory level action generation in combination with video guidance to enable an agent to solve complex tasks.
We show how our approach is on par with or even surpasses multiple behavior cloning baselines trained on expert demonstrations.
arXiv Detail & Related papers (2024-11-11T18:43:44Z) - Latent Action Pretraining from Videos [156.88613023078778]
We introduce Latent Action Pretraining for general Action models (LAPA)
LAPA is an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels.
We propose a method to learn from internet-scale videos that do not have robot action labels.
arXiv Detail & Related papers (2024-10-15T16:28:09Z) - Zero-Shot Offline Imitation Learning via Optimal Transport [21.548195072895517]
Zero-shot imitation learning algorithms reproduce unseen behavior from as little as a single demonstration at test time.
Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy.
We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning.
arXiv Detail & Related papers (2024-10-11T12:10:51Z) - WANDR: Intention-guided Human Motion Generation [67.07028110459787]
We introduce WANDR, a data-driven model that takes an avatar's initial pose and a goal's 3D position and generates natural human motions that place the end effector (wrist) on the goal location.
Intention guides the agent to the goal, and interactively adapts the generation to novel situations without needing to define sub-goals or the entire motion path.
We evaluate our method extensively and demonstrate its ability to generate natural and long-term motions that reach 3D goals and to unseen goal locations.
arXiv Detail & Related papers (2024-04-23T10:20:17Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - Tragedy Plus Time: Capturing Unintended Human Activities from
Weakly-labeled Videos [31.1632730473261]
W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 30 unintentional video-level activity labels collected through human annotations.
We propose a weakly supervised algorithm for localizing the goal-directed as well as unintentional temporal regions in the video.
arXiv Detail & Related papers (2022-04-28T14:56:43Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.