Related papers: Representation Learning for Event-based Visuomotor Policies

Representation Learning for Event-based Visuomotor Policies

URL: http://arxiv.org/abs/2103.00806v1
Date: Mon, 1 Mar 2021 07:04:00 GMT
Title: Representation Learning for Event-based Visuomotor Policies
Authors: Sai Vemprala, Sami Mian, Ashish Kapoor
Abstract summary: We present an evental autocoder for unsupervised representation from asynchronous event data. We show that it is feasible to learn compact representations fromtemporal event data to encode context. We validate this framework of learning visuomotor policies by applying it to an obstacle avoidance scenario in simulation.
Score: 18.4767874925189
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Event-based cameras are dynamic vision sensors that can provide asynchronous measurements of changes in per-pixel brightness at a microsecond level. This makes them significantly faster than conventional frame-based cameras, and an appealing choice for high-speed navigation. While an interesting sensor modality, this asynchronous data poses a challenge for common machine learning techniques. In this paper, we present an event variational autoencoder for unsupervised representation learning from asynchronous event camera data. We show that it is feasible to learn compact representations from spatiotemporal event data to encode the context. Furthermore, we show that such pretrained representations can be beneficial for navigation, allowing for usage in reinforcement learning instead of end-to-end reward driven perception. We validate this framework of learning visuomotor policies by applying it to an obstacle avoidance scenario in simulation. We show that representations learnt from event data enable training fast control policies that can adapt to different control capacities, and demonstrate a higher degree of robustness than end-to-end learning from event images.

Related papers

Event Masked Autoencoder: Point-wise Action Recognition with Event-Based Cameras [8.089601548579116]
We propose a novel framework that preserves and exploits the structure of event data for action recognition. Our framework consists of two main components: 1) a point-wise event masked autoencoder (MAE) that learns a compact and discrimi representation of event patches by reconstructing them from masked raw event camera points data; 2) an improved event points patch generation algorithm that leverages an event data inlier model and point-wise data augmentation techniques to enhance the quality and diversity event points patches.
arXiv Detail & Related papers (2025-01-02T03:49:03Z)
E-Motion: Future Motion Simulation via Event Sequence Diffusion [86.80533612211502]
Event-based sensors may potentially offer a unique opportunity to predict future motion with a level of detail and precision previously unachievable. We propose to integrate the strong learning capacity of the video diffusion model with the rich motion information of an event camera as a motion simulation framework. Our findings suggest a promising direction for future research in enhancing the interpretative power and predictive accuracy of computer vision systems.
arXiv Detail & Related papers (2024-10-11T09:19:23Z)
Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation [23.871860648919593]
Event-based cameras provide accurate and high temporal resolution measurements for performing computer vision tasks. Despite their advantages, utilizing deep learning for event-based vision encounters a significant obstacle due to the scarcity of annotated data. We propose a new algorithm tailored for adapting a deep neural network trained on annotated frame-based data to generalize well on event-based unannotated data.
arXiv Detail & Related papers (2024-01-02T05:10:08Z)
Event Camera Data Dense Pre-training [10.918407820258246]
This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data. For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns.
arXiv Detail & Related papers (2023-11-20T04:36:19Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
Unsupervised Domain Adaptation for Training Event-Based Networks Using Contrastive Learning and Uncorrelated Conditioning [12.013345715187285]
Deep learning in event-based vision faces the challenge of annotated data scarcity due to recency of event cameras. We develop an unsupervised domain adaptation algorithm for training a deep network for event-based data image classification.
arXiv Detail & Related papers (2023-03-22T09:51:08Z)
Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors. Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction. We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z)
Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames. Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z)
Unsupervised Feature Learning for Event Data: Direct vs Inverse Problem Formulation [53.850686395708905]
Event-based cameras record an asynchronous stream of per-pixel brightness changes. In this paper, we focus on single-layer architectures for representation learning from event data. We show improvements of up to 9 % in the recognition accuracy compared to the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-23T10:40:03Z)
Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events" We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output. We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.