Related papers: Learning to Exploit Multiple Vision Modalities by Using Grafted Networks

Learning to Exploit Multiple Vision Modalities by Using Grafted Networks

URL: http://arxiv.org/abs/2003.10959v3
Date: Wed, 22 Jul 2020 11:35:18 GMT
Title: Learning to Exploit Multiple Vision Modalities by Using Grafted Networks
Authors: Yuhuang Hu and Tobi Delbruck and Shih-Chii Liu
Abstract summary: Novel vision sensors provide information that is not available from conventional intensity cameras. An obstacle to using these sensors with current powerful deep neural networks is the lack of large labeled training datasets. This paper proposes a Network Grafting Algorithm (NGA), where a new front end network driven by unconventional visual inputs replaces the front end network of a pretrained deep network that processes intensity frames.
Score: 16.562442770255032
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Novel vision sensors such as thermal, hyperspectral, polarization, and event cameras provide information that is not available from conventional intensity cameras. An obstacle to using these sensors with current powerful deep neural networks is the lack of large labeled training datasets. This paper proposes a Network Grafting Algorithm (NGA), where a new front end network driven by unconventional visual inputs replaces the front end network of a pretrained deep network that processes intensity frames. The self-supervised training uses only synchronously-recorded intensity frames and novel sensor data to maximize feature similarity between the pretrained network and the grafted network. We show that the enhanced grafted network reaches competitive average precision (AP50) scores to the pretrained network on an object detection task using thermal and event camera datasets, with no increase in inference costs. Particularly, the grafted network driven by thermal frames showed a relative improvement of 49.11% over the use of intensity frames. The grafted front end has only 5--8% of the total parameters and can be trained in a few hours on a single GPU equivalent to 5% of the time that would be needed to train the entire object detector from labeled data. NGA allows new vision sensors to capitalize on previously pretrained powerful deep models, saving on training cost and widening a range of applications for novel sensors.

Related papers

10K is Enough: An Ultra-Lightweight Binarized Network for Infrared Small-Target Detection [48.074211420276605]
Binarized neural networks (BNNs) are distinguished by their exceptional efficiency in model compression. We propose the Binarized Infrared Small-Target Detection Network (BiisNet) BiisNet preserves the core operations of binarized convolutions while integrating full-precision features into the network's information flow.
arXiv Detail & Related papers (2025-03-04T14:25:51Z)
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation [3.355813093377501]
Event cameras operate differently from traditional digital cameras, continuously capturing data and generating binary spikes that encode time, location, and light intensity. This necessitates the development of innovative, spike-aware algorithms tailored for event cameras. We propose a purely spike-driven spike transformer network for depth estimation from spiking camera data.
arXiv Detail & Related papers (2024-04-26T11:32:53Z)
RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network [7.112892720740359]
Event-based cameras are inspired by spiking and asynchronous spike representation of the biological visual system. We propose a neural network architecture, based on simple convolution layers integrated with dynamic temporal encoding for local and global reservoirs. RN-Net achieves the highest accuracy of 99.2% for DV128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size.
arXiv Detail & Related papers (2023-03-19T21:20:45Z)
Optical flow estimation from event-based cameras and spiking neural networks [0.4899818550820575]
Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs) We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations. Thanks to separable convolutions, we have been able to develop a light model that can nonetheless yield reasonably accurate optical flow estimates.
arXiv Detail & Related papers (2023-02-13T16:17:54Z)
Neural Maximum A Posteriori Estimation on Unpaired Data for Motion Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data. The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z)
Evaluation of Thermal Imaging on Embedded GPU Platforms for Application in Vehicular Assistance Systems [0.5156484100374058]
This study is focused on evaluating the real-time performance of thermal object detection for smart and safe vehicular systems. A novel large-scale thermal dataset comprising of > 35,000 distinct frames is acquired. The effectiveness of trained networks is validated on extensive test data using various quantitative metrics.
arXiv Detail & Related papers (2022-01-05T15:36:25Z)
Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames. Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks. We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z)
Pixel Difference Networks for Efficient Edge Detection [71.03915957914532]
We propose a lightweight yet effective architecture named Pixel Difference Network (PiDiNet) for efficient edge detection. Extensive experiments on BSDS500, NYUD, and Multicue datasets are provided to demonstrate its effectiveness. A faster version of PiDiNet with less than 0.1M parameters can still achieve comparable performance among state of the arts with 200 FPS.
arXiv Detail & Related papers (2021-08-16T10:42:59Z)
Dataset for eye-tracking tasks [0.0]
We present a dataset that is suitable for training custom models of convolutional neural networks for eye-tracking tasks. This dataset contains 10,000 eye images in an extension of 416 by 416 pixels. This manuscript can be considered as a guide for the preparation of datasets for eye-tracking devices.
arXiv Detail & Related papers (2021-06-01T23:54:23Z)
Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z)
Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor Fusion and Deep Fused Spiking-Analog Network Architectures [7.565038387344594]
We present a sensor fusion framework for energy-efficient optical flow estimation using both frame- and event-based sensors. Our network is end-to-end trained using unsupervised learning to avoid expensive video annotations.
arXiv Detail & Related papers (2021-03-19T02:03:33Z)
Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction [51.072733683919246]
We introduce Recurrent Asynchronous Multimodal (RAM) networks to handle asynchronous and irregular data from multiple sensors. Inspired by traditional RNNs, RAM networks maintain a hidden state that is updated asynchronously and can be queried at any time to generate a prediction. We show an improvement over state-of-the-art methods by up to 30% in terms of mean depth absolute error.
arXiv Detail & Related papers (2021-02-18T13:24:35Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.