Adversarial Imitation Learning from Video using a State Observer
- URL: http://arxiv.org/abs/2202.00243v1
- Date: Tue, 1 Feb 2022 06:46:48 GMT
- Title: Adversarial Imitation Learning from Video using a State Observer
- Authors: Haresh Karnan, Garrett Warnell, Faraz Torabi, Peter Stone
- Abstract summary: We introduce a new algorithm called Visual Generative Adversarial Imitation from Observation using a State Observer VGAIfO-SO.
At its core, VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised state observer.
We show experimentally in several continuous control environments that VGAIfO-SO is more sample efficient than other IfO algorithms at learning from video-only demonstrations.
- Score: 50.45370139579214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The imitation learning research community has recently made significant
progress towards the goal of enabling artificial agents to imitate behaviors
from video demonstrations alone. However, current state-of-the-art approaches
developed for this problem exhibit high sample complexity due, in part, to the
high-dimensional nature of video observations. Towards addressing this issue,
we introduce here a new algorithm called Visual Generative Adversarial
Imitation from Observation using a State Observer VGAIfO-SO. At its core,
VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised
state observer, which provides estimates of lower-dimensional proprioceptive
state representations from high-dimensional images. We show experimentally in
several continuous control environments that VGAIfO-SO is more sample efficient
than other IfO algorithms at learning from video-only demonstrations and can
sometimes even achieve performance close to the Generative Adversarial
Imitation from Observation (GAIfO) algorithm that has privileged access to the
demonstrator's proprioceptive state information.
Related papers
- MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation [5.0923114224599555]
This paper introduces a novel hierarchical graph neural network (GNN) based model MissionGNN.
Our approach circumvents the limitations of previous methods by avoiding heavy gradient computations on large multimodal models.
Our model provides a practical and efficient solution for real-time video analysis without the constraints of previous segmentation-based or multimodal approaches.
arXiv Detail & Related papers (2024-06-27T01:09:07Z) - VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction [2.3349787245442966]
Video Anomaly Detection (VAD) is a challenging and prominent research task within computer vision.
This paper introduces TSGAD, a novel human-centric Two-Stream Graph-Improved Anomaly Detection.
We demonstrate TSGAD's effectiveness through comprehensive experimentation on benchmark datasets.
arXiv Detail & Related papers (2024-04-29T14:25:06Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Unsupervised Video Anomaly Detection with Diffusion Models Conditioned
on Compact Motion Representations [17.816344808780965]
unsupervised video anomaly detection (VAD) problem involves classifying each frame in a video as normal or abnormal, without any access to labels.
To accomplish this, proposed method employs conditional diffusion models, where the input data is features extracted from pre-trained network.
Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events.
arXiv Detail & Related papers (2023-07-04T07:36:48Z) - Provable RL with Exogenous Distractors via Multistep Inverse Dynamics [85.52408288789164]
Real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera.
Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations.
However, such approaches can fail in the presence of temporally correlated noise in the observations.
arXiv Detail & Related papers (2021-10-17T15:21:27Z) - Video Anomaly Detection Using Pre-Trained Deep Convolutional Neural Nets
and Context Mining [2.0646127669654835]
We show how to use pre-trained convolutional neural net models to perform feature extraction and context mining.
We derive contextual properties from the high-level features to further improve the performance of our video anomaly detection method.
arXiv Detail & Related papers (2020-10-06T00:26:14Z) - TinyVIRAT: Low-resolution Video Action Recognition [70.37277191524755]
In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions.
We introduce a benchmark dataset, TinyVIRAT, which contains natural low-resolution activities.
We propose a novel method for recognizing tiny actions in videos which utilizes a progressive generative approach.
arXiv Detail & Related papers (2020-07-14T21:09:18Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.