Learning to Exploit Multiple Vision Modalities by Using Grafted Networks
- URL: http://arxiv.org/abs/2003.10959v3
- Date: Wed, 22 Jul 2020 11:35:18 GMT
- Title: Learning to Exploit Multiple Vision Modalities by Using Grafted Networks
- Authors: Yuhuang Hu and Tobi Delbruck and Shih-Chii Liu
- Abstract summary: Novel vision sensors provide information that is not available from conventional intensity cameras.
An obstacle to using these sensors with current powerful deep neural networks is the lack of large labeled training datasets.
This paper proposes a Network Grafting Algorithm (NGA), where a new front end network driven by unconventional visual inputs replaces the front end network of a pretrained deep network that processes intensity frames.
- Score: 16.562442770255032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel vision sensors such as thermal, hyperspectral, polarization, and event
cameras provide information that is not available from conventional intensity
cameras. An obstacle to using these sensors with current powerful deep neural
networks is the lack of large labeled training datasets. This paper proposes a
Network Grafting Algorithm (NGA), where a new front end network driven by
unconventional visual inputs replaces the front end network of a pretrained
deep network that processes intensity frames. The self-supervised training uses
only synchronously-recorded intensity frames and novel sensor data to maximize
feature similarity between the pretrained network and the grafted network. We
show that the enhanced grafted network reaches competitive average precision
(AP50) scores to the pretrained network on an object detection task using
thermal and event camera datasets, with no increase in inference costs.
Particularly, the grafted network driven by thermal frames showed a relative
improvement of 49.11% over the use of intensity frames. The grafted front end
has only 5--8% of the total parameters and can be trained in a few hours on a
single GPU equivalent to 5% of the time that would be needed to train the
entire object detector from labeled data. NGA allows new vision sensors to
capitalize on previously pretrained powerful deep models, saving on training
cost and widening a range of applications for novel sensors.
Related papers
- A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation [3.355813093377501]
Event cameras operate differently from traditional digital cameras, continuously capturing data and generating binary spikes that encode time, location, and light intensity.
This necessitates the development of innovative, spike-aware algorithms tailored for event cameras.
We propose a purely spike-driven spike transformer network for depth estimation from spiking camera data.
arXiv Detail & Related papers (2024-04-26T11:32:53Z) - RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network [7.112892720740359]
Event-based cameras are inspired by spiking and asynchronous spike representation of the biological visual system.
We propose a neural network architecture, based on simple convolution layers integrated with dynamic temporal encoding for local and global reservoirs.
RN-Net achieves the highest accuracy of 99.2% for DV128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size.
arXiv Detail & Related papers (2023-03-19T21:20:45Z) - Optical flow estimation from event-based cameras and spiking neural
networks [0.4899818550820575]
Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs)
We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations.
Thanks to separable convolutions, we have been able to develop a light model that can nonetheless yield reasonably accurate optical flow estimates.
arXiv Detail & Related papers (2023-02-13T16:17:54Z) - Neural Maximum A Posteriori Estimation on Unpaired Data for Motion
Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data.
The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z) - Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for
Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames.
Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks.
We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z) - Pixel Difference Networks for Efficient Edge Detection [71.03915957914532]
We propose a lightweight yet effective architecture named Pixel Difference Network (PiDiNet) for efficient edge detection.
Extensive experiments on BSDS500, NYUD, and Multicue datasets are provided to demonstrate its effectiveness.
A faster version of PiDiNet with less than 0.1M parameters can still achieve comparable performance among state of the arts with 200 FPS.
arXiv Detail & Related papers (2021-08-16T10:42:59Z) - Dataset for eye-tracking tasks [0.0]
We present a dataset that is suitable for training custom models of convolutional neural networks for eye-tracking tasks.
This dataset contains 10,000 eye images in an extension of 416 by 416 pixels.
This manuscript can be considered as a guide for the preparation of datasets for eye-tracking devices.
arXiv Detail & Related papers (2021-06-01T23:54:23Z) - Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images.
When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z) - Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor
Fusion and Deep Fused Spiking-Analog Network Architectures [7.565038387344594]
We present a sensor fusion framework for energy-efficient optical flow estimation using both frame- and event-based sensors.
Our network is end-to-end trained using unsupervised learning to avoid expensive video annotations.
arXiv Detail & Related papers (2021-03-19T02:03:33Z) - Combining Events and Frames using Recurrent Asynchronous Multimodal
Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors.
Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction.
We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.