Related papers: Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays

Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays

URL: http://arxiv.org/abs/2004.12525v1
Date: Mon, 27 Apr 2020 01:00:35 GMT
Title: Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays
Authors: Laurie Bose, Jianing Chen, Stephen J. Carey, Piotr Dudek, Walterio Mayol-Cuevas
Abstract summary: We present a novel method of CNN inference for pixel processor array ( PPA) vision sensors. Our approach can perform convolutional layers, max pooling, ReLu, and a final fully connected layer entirely upon the PPA sensor. This is the first work demonstrating CNN inference conducted entirely upon the processor array of a PPA vision sensor device, requiring no external processing.
Score: 16.531637803429277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel method of CNN inference for pixel processor array (PPA) vision sensors, designed to take advantage of their massive parallelism and analog compute capabilities. PPA sensors consist of an array of processing elements (PEs), with each PE capable of light capture, data storage and computation, allowing various computer vision processing to be executed directly upon the sensor device. The key idea behind our approach is storing network weights "in-pixel" within the PEs of the PPA sensor itself to allow various computations, such as multiple different image convolutions, to be carried out in parallel. Our approach can perform convolutional layers, max pooling, ReLu, and a final fully connected layer entirely upon the PPA sensor, while leaving no untapped computational resources. This is in contrast to previous works that only use a sensor-level processing to sequentially compute image convolutions, and must transfer data to an external digital processor to complete the computation. We demonstrate our approach on the SCAMP-5 vision system, performing inference of a MNIST digit classification network at over 3000 frames per second and over 93% classification accuracy. This is the first work demonstrating CNN inference conducted entirely upon the processor array of a PPA vision sensor device, requiring no external processing.

Related papers

PixelWorld: Towards Perceiving Everything as Pixels [50.13953243722129]
We propose to unify all modalities (text, tables, code, diagrams, images, etc) as pixel inputs, i.e. "Perceive Everything as Pixels" (PEAP) We introduce PixelWorld, a novel evaluation suite that unifies all the mentioned modalities into pixel space to gauge the existing models' performance.
arXiv Detail & Related papers (2025-01-31T17:39:21Z)
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding [49.218195440600354]
Current image pyramids use the same large-scale model to process multiple resolutions, leading to significant computational cost. We propose a novel network architecture, called COCO-Inverted Image Pyramid Networks (PIIP) PIIP uses pretrained models (ViTs or CNNs) as branches to process multi-scale images, where images of higher resolutions are processed by smaller network branches to balance computational cost and performance.
arXiv Detail & Related papers (2025-01-14T01:57:41Z)
Data-Driven Pixel Control: Challenges and Prospects [13.158333009169418]
We study a data-driven system that combines dynamic sensing at the pixel level with computer vision analytics at the video level. Our system achieves a 10X reduction in bandwidth and a 15-30X improvement in Energy-Delay Product (EDP) when activating only 30% of pixels.
arXiv Detail & Related papers (2024-08-08T21:49:19Z)
Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP) Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid. PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
Mapping Image Transformations Onto Pixel Processor Arrays [4.857223862405921]
Pixel Processor Arrays (PPA) present a new vision sensor/processor architecture consisting of a SIMD array of processor elements. We demonstrate how various image transformations, including shearing, rotation and scaling, can be performed directly upon a PPA.
arXiv Detail & Related papers (2024-03-25T17:56:41Z)
Single-Shot Optical Neural Network [55.41644538483948]
'Weight-stationary' analog optical and electronic hardware has been proposed to reduce the compute resources required by deep neural networks. We present a scalable, single-shot-per-layer weight-stationary optical processor.
arXiv Detail & Related papers (2022-05-18T17:49:49Z)
P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications [4.102356304183255]
High-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. We propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution and ReLU. Our results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by 21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML
arXiv Detail & Related papers (2022-03-07T04:15:29Z)
On-Sensor Binarized Fully Convolutional Neural Network with A Pixel Processor Array [17.4097919720973]
This work presents a method to implement fully convolutional neural networks (FCNs) on Pixel Processor Array ( PPA) sensors. We design and train binarized FCN for both binary weights and activations using batchnorm, group convolution, and learnable threshold for binarization. We demonstrate the first implementation of an FCN on a PPA device, performing three convolution layers entirely in the pixel-level processors.
arXiv Detail & Related papers (2022-02-02T01:18:40Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
Visual Transformers: Token-based Image Representation and Processing for Computer Vision [67.55770209540306]
Visual Transformer ( VT) operates in a semantic token space, judiciously attending to different image parts based on context. Using an advanced training recipe, our VTs significantly outperform their convolutional counterparts. For semantic segmentation on LIP and COCO-stuff, VT-based feature pyramid networks (FPN) achieve 0.35 points higher mIoU while reducing the FPN module's FLOPs by 6.5x.
arXiv Detail & Related papers (2020-06-05T20:49:49Z)
AnalogNet: Convolutional Neural Network Inference on Analog Focal Plane Sensor Processors [0.0]
We present a high-speed, energy-efficient Convolutional Neural Network (CNN) architecture utilising the capabilities of a unique class of devices known as analog Plane Sensor Processors (FPSP) Unlike traditional vision systems, where the sensor array sends collected data to a separate processor for processing, FPSPs allow data to be processed on the imaging device itself. Our proposed architecture, coined AnalogNet, reaches a testing accuracy of 96.9% on the MNIST handwritten digits recognition task, at a speed of 2260 FPS, for a cost of 0.7 mJ per frame.
arXiv Detail & Related papers (2020-06-02T16:44:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.