Learning Goals from Failure
- URL: http://arxiv.org/abs/2006.15657v2
- Date: Sun, 13 Dec 2020 01:44:08 GMT
- Title: Learning Goals from Failure
- Authors: Dave Epstein and Carl Vondrick
- Abstract summary: We introduce a framework that predicts the goals behind observable human action in video.
Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of goals without direct supervision.
- Score: 30.071336708348472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a framework that predicts the goals behind observable human
action in video. Motivated by evidence in developmental psychology, we leverage
video of unintentional action to learn video representations of goals without
direct supervision. Our approach models videos as contextual trajectories that
represent both low-level motion and high-level action features. Experiments and
visualizations show our trained model is able to predict the underlying goals
in video of unintentional action. We also propose a method to "automatically
correct" unintentional action by leveraging gradient signals of our model to
adjust latent trajectories. Although the model is trained with minimal
supervision, it is competitive with or outperforms baselines trained on large
(supervised) datasets of successfully executed goals, showing that observing
unintentional action is crucial to learning about goals in video. Project page:
https://aha.cs.columbia.edu/
Related papers
- An Empirical Study of Autoregressive Pre-training from Videos [67.15356613065542]
We treat videos as visual tokens and train transformer models to autoregressively predict future tokens.
Our models are pre-trained on a diverse dataset of videos and images comprising over 1 trillion visual tokens.
Our results demonstrate that, despite minimal inductive biases, autoregressive pre-training leads to competitive performance.
arXiv Detail & Related papers (2025-01-09T18:59:58Z) - Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning [27.233232260388682]
We introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned.
Our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37.6% in terms of human normalized score.
arXiv Detail & Related papers (2024-12-07T03:10:27Z) - Grounding Video Models to Actions through Goal Conditioned Exploration [29.050431676226115]
We propose a framework that uses trajectory level action generation in combination with video guidance to enable an agent to solve complex tasks.
We show how our approach is on par with or even surpasses multiple behavior cloning baselines trained on expert demonstrations.
arXiv Detail & Related papers (2024-11-11T18:43:44Z) - WANDR: Intention-guided Human Motion Generation [67.07028110459787]
We introduce WANDR, a data-driven model that takes an avatar's initial pose and a goal's 3D position and generates natural human motions that place the end effector (wrist) on the goal location.
Intention guides the agent to the goal, and interactively adapts the generation to novel situations without needing to define sub-goals or the entire motion path.
We evaluate our method extensively and demonstrate its ability to generate natural and long-term motions that reach 3D goals and to unseen goal locations.
arXiv Detail & Related papers (2024-04-23T10:20:17Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - Tragedy Plus Time: Capturing Unintended Human Activities from
Weakly-labeled Videos [31.1632730473261]
W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 30 unintentional video-level activity labels collected through human annotations.
We propose a weakly supervised algorithm for localizing the goal-directed as well as unintentional temporal regions in the video.
arXiv Detail & Related papers (2022-04-28T14:56:43Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.