Related papers: P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications

P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications

URL: http://arxiv.org/abs/2203.04737v1
Date: Mon, 7 Mar 2022 04:15:29 GMT
Title: P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications
Authors: Gourav Datta, Souvik Kundu, Zihan Yin, Ravi Teja Lakkireddy, Peter A. Beerel, Ajey Jacob, Akhilesh R. Jaiswal
Abstract summary: High-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. We propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution and ReLU. Our results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by 21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML
Score: 4.102356304183255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tried to take advantage of massively parallel low-power analog/digital computing in the form of near- and in-sensor processing, in which the AI computation is performed partly in the periphery of the pixel array and partly in a separate on-board CPU/accelerator. Unfortunately, high-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. To mitigate this problem, we propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution and ReLU (Rectified Linear Units). Our solution includes a holistic algorithm-circuit co-design approach and the resulting P2M paradigm can be used as a drop-in replacement for embedding memory-intensive first few layers of convolutional neural network (CNN) models within foundry-manufacturable CMOS image sensor platforms. Our experimental results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by ~21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML use case for visual wake words dataset (VWW) by up to ~11x compared to standard near-processing or in-sensor implementations, without any significant drop in test accuracy.

Related papers

bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction [57.199618102578576]
We propose bit2bit, a new method for reconstructing high-quality image stacks at original resolution from sparse binary quantatemporal image data. Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data. We present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions.
arXiv Detail & Related papers (2024-10-30T17:30:35Z)
Digital-analog hybrid matrix multiplication processor for optical neural networks [11.171425574890765]
We propose a digital-analog hybrid optical computing architecture for optical neural networks (ONNs) By introducing the logic levels and decisions based on thresholding, the calculation precision can be significantly enhanced. We have demonstrated an unprecedented 16-bit calculation precision for high-definition image processing, with a pixel error rate (PER) as low as $1.8times10-3$ at a signal-to-noise ratio (SNR) of 18.2 dB.
arXiv Detail & Related papers (2024-01-26T18:42:57Z)
Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement [83.60486465697318]
We propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames. We extend an M2M++ framework by introducing a flexible Spatial Selective Refinement component, which allows for trading computational efficiency for quality and vice versa.
arXiv Detail & Related papers (2023-10-29T09:09:32Z)
Improving Pixel-based MIM by Reducing Wasted Modeling Capability [77.99468514275185]
We propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction. To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures. Our method yields significant performance gains, such as 1.2% on fine-tuning, 2.8% on linear probing, and 2.6% on semantic segmentation.
arXiv Detail & Related papers (2023-08-01T03:44:56Z)
Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers [3.8831062015253055]
We introduce a Single-Image Super-Resolution (SISR) approach to enhance the detection of structural and textural features in surveillance images. Our approach incorporates sub-pixel convolution layers and a loss function that uses an Optical Character Recognition (OCR) model for feature extraction. Our results show that our approach for reconstructing these low-resolution synthesized images outperforms existing ones in both quantitative and qualitative measures.
arXiv Detail & Related papers (2023-05-27T00:17:19Z)
Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block. Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z)
Toward Efficient Hyperspectral Image Processing inside Camera Pixels [1.6449390849183356]
Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands. To mitigate this problem, we propose a form of processing-in-pixel (PIP) Our PIP-optimized custom CNN layers effectively compress the input data, significantly reducing the bandwidth required to transmit the data downstream to the HSI processing unit.
arXiv Detail & Related papers (2022-03-11T01:06:02Z)
Parallel Discrete Convolutions on Adaptive Particle Representations of Images [2.362412515574206]
We present data structures and algorithms for native implementations of discrete convolution operators over Adaptive Particle Representations. The APR is a content-adaptive image representation that locally adapts the sampling resolution to the image signal. We show that APR convolution naturally leads to scale-adaptive algorithms that efficiently parallelize on multi-core CPU and GPU architectures.
arXiv Detail & Related papers (2021-12-07T09:40:05Z)
Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z)
Generating Superpixels for High-resolution Images with Decoupled Patch Calibration [82.21559299694555]
Patch Networks (PCNet) is designed to efficiently and accurately implement high-resolution superpixel segmentation. DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries. In particular, DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries.
arXiv Detail & Related papers (2021-08-19T10:33:05Z)
Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays [16.531637803429277]
We present a novel method of CNN inference for pixel processor array ( PPA) vision sensors. Our approach can perform convolutional layers, max pooling, ReLu, and a final fully connected layer entirely upon the PPA sensor. This is the first work demonstrating CNN inference conducted entirely upon the processor array of a PPA vision sensor device, requiring no external processing.
arXiv Detail & Related papers (2020-04-27T01:00:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.