Representation Learning for Event-based Visuomotor Policies
- URL: http://arxiv.org/abs/2103.00806v1
- Date: Mon, 1 Mar 2021 07:04:00 GMT
- Title: Representation Learning for Event-based Visuomotor Policies
- Authors: Sai Vemprala, Sami Mian, Ashish Kapoor
- Abstract summary: We present an evental autocoder for unsupervised representation from asynchronous event data.
We show that it is feasible to learn compact representations fromtemporal event data to encode context.
We validate this framework of learning visuomotor policies by applying it to an obstacle avoidance scenario in simulation.
- Score: 18.4767874925189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event-based cameras are dynamic vision sensors that can provide asynchronous
measurements of changes in per-pixel brightness at a microsecond level. This
makes them significantly faster than conventional frame-based cameras, and an
appealing choice for high-speed navigation. While an interesting sensor
modality, this asynchronous data poses a challenge for common machine learning
techniques. In this paper, we present an event variational autoencoder for
unsupervised representation learning from asynchronous event camera data. We
show that it is feasible to learn compact representations from spatiotemporal
event data to encode the context. Furthermore, we show that such pretrained
representations can be beneficial for navigation, allowing for usage in
reinforcement learning instead of end-to-end reward driven perception. We
validate this framework of learning visuomotor policies by applying it to an
obstacle avoidance scenario in simulation. We show that representations learnt
from event data enable training fast control policies that can adapt to
different control capacities, and demonstrate a higher degree of robustness
than end-to-end learning from event images.
Related papers
- E-Motion: Future Motion Simulation via Event Sequence Diffusion [86.80533612211502]
Event-based sensors may potentially offer a unique opportunity to predict future motion with a level of detail and precision previously unachievable.
We propose to integrate the strong learning capacity of the video diffusion model with the rich motion information of an event camera as a motion simulation framework.
Our findings suggest a promising direction for future research in enhancing the interpretative power and predictive accuracy of computer vision systems.
arXiv Detail & Related papers (2024-10-11T09:19:23Z) - Relating Events and Frames Based on Self-Supervised Learning and
Uncorrelated Conditioning for Unsupervised Domain Adaptation [23.871860648919593]
Event-based cameras provide accurate and high temporal resolution measurements for performing computer vision tasks.
Despite their advantages, utilizing deep learning for event-based vision encounters a significant obstacle due to the scarcity of annotated data.
We propose a new algorithm tailored for adapting a deep neural network trained on annotated frame-based data to generalize well on event-based unannotated data.
arXiv Detail & Related papers (2024-01-02T05:10:08Z) - Event Camera Data Dense Pre-training [10.918407820258246]
This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data.
For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns.
arXiv Detail & Related papers (2023-11-20T04:36:19Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Unsupervised Domain Adaptation for Training Event-Based Networks Using
Contrastive Learning and Uncorrelated Conditioning [12.013345715187285]
Deep learning in event-based vision faces the challenge of annotated data scarcity due to recency of event cameras.
We develop an unsupervised domain adaptation algorithm for training a deep network for event-based data image classification.
arXiv Detail & Related papers (2023-03-22T09:51:08Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem
Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes.
In this paper, we focus on single-layer architectures for representation learning from event data.
We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z) - Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events"
We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output.
We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.