Fast Motion Understanding with Spatiotemporal Neural Networks and
Dynamic Vision Sensors
- URL: http://arxiv.org/abs/2011.09427v1
- Date: Wed, 18 Nov 2020 17:55:07 GMT
- Title: Fast Motion Understanding with Spatiotemporal Neural Networks and
Dynamic Vision Sensors
- Authors: Anthony Bisulco, Fernando Cladera Ojeda, Volkan Isler, Daniel D. Lee
- Abstract summary: This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning about high speed motion.
We consider the case of a robot at rest reacting to a small, fast approaching object at speeds higher than 15m/s.
We highlight the results of our system to a toy dart moving at 23.4m/s with a 24.73deg error in $theta$, 18.4mm average discretized radius prediction error, and 25.03% median time to collision prediction error.
- Score: 99.94079901071163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a Dynamic Vision Sensor (DVS) based system for reasoning
about high speed motion. As a representative scenario, we consider the case of
a robot at rest reacting to a small, fast approaching object at speeds higher
than 15m/s. Since conventional image sensors at typical frame rates observe
such an object for only a few frames, estimating the underlying motion presents
a considerable challenge for standard computer vision systems and algorithms.
In this paper we present a method motivated by how animals such as insects
solve this problem with their relatively simple vision systems.
Our solution takes the event stream from a DVS and first encodes the temporal
events with a set of causal exponential filters across multiple time scales. We
couple these filters with a Convolutional Neural Network (CNN) to efficiently
extract relevant spatiotemporal features. The combined network learns to output
both the expected time to collision of the object, as well as the predicted
collision point on a discretized polar grid. These critical estimates are
computed with minimal delay by the network in order to react appropriately to
the incoming object. We highlight the results of our system to a toy dart
moving at 23.4m/s with a 24.73{\deg} error in ${\theta}$, 18.4mm average
discretized radius prediction error, and 25.03% median time to collision
prediction error.
Related papers
- EV-Catcher: High-Speed Object Catching Using Low-latency Event-based
Neural Networks [107.62975594230687]
We demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects.
We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency.
We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms.
arXiv Detail & Related papers (2023-04-14T15:23:28Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - DroneAttention: Sparse Weighted Temporal Attention for Drone-Camera
Based Activity Recognition [2.705905918316948]
Human activity recognition (HAR) using drone-mounted cameras has attracted considerable interest from the computer vision research community in recent years.
We propose a novel Sparse Weighted Temporal Attention (SWTA) module to utilize sparsely sampled video frames for obtaining global weighted temporal attention.
The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets.
arXiv Detail & Related papers (2022-12-07T00:33:40Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - VideoPose: Estimating 6D object pose from videos [14.210010379733017]
We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos.
Our proposed network takes a pre-trained 2D object detector as input, and aggregates visual features through a recurrent neural network to make predictions at each frame.
Experimental evaluation on the YCB-Video dataset show that our approach is on par with the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-11-20T20:57:45Z) - Object Tracking by Detection with Visual and Motion Cues [1.7818230914983044]
Self-driving cars need to detect and track objects in camera images.
We present a simple online tracking algorithm that is based on a constant velocity motion model with a Kalman filter.
We evaluate our approach on the challenging BDD100 dataset.
arXiv Detail & Related papers (2021-01-19T10:29:16Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Event-based Robotic Grasping Detection with Neuromorphic Vision Sensor
and Event-Stream Dataset [8.030163836902299]
Neuromorphic vision is a small and young community of research. Compared to traditional frame-based computer vision, neuromorphic vision is a small and young community of research.
We construct a robotic grasping dataset named Event-Stream dataset with 91 objects.
As leds blink at high frequency, the Event-Stream dataset is annotated in a high frequency of 1 kHz.
We develop a deep neural network for grasping detection which consider the angle learning problem as classification instead of regression.
arXiv Detail & Related papers (2020-04-28T16:55:19Z) - A Time-Delay Feedback Neural Network for Discriminating Small,
Fast-Moving Targets in Complex Dynamic Environments [8.645725394832969]
Discriminating small moving objects within complex visual environments is a significant challenge for autonomous micro robots.
We propose an STMD-based neural network with feedback connection (Feedback STMD), where the network output is temporally delayed, then fed back to the lower layers to mediate neural responses.
arXiv Detail & Related papers (2019-12-29T03:10:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.