Related papers: Mapping Image Transformations Onto Pixel Processor Arrays

Mapping Image Transformations Onto Pixel Processor Arrays

URL: http://arxiv.org/abs/2403.16994v1
Date: Mon, 25 Mar 2024 17:56:41 GMT
Title: Mapping Image Transformations Onto Pixel Processor Arrays
Authors: Laurie Bose, Piotr Dudek,
Abstract summary: Pixel Processor Arrays (PPA) present a new vision sensor/processor architecture consisting of a SIMD array of processor elements. We demonstrate how various image transformations, including shearing, rotation and scaling, can be performed directly upon a PPA.
Score: 4.857223862405921
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pixel Processor Arrays (PPA) present a new vision sensor/processor architecture consisting of a SIMD array of processor elements, each capable of light capture, storage, processing and local communication. Such a device allows visual data to be efficiently stored and manipulated directly upon the focal plane, but also demands the invention of new approaches and algorithms, suitable for the massively-parallel fine-grain processor arrays. In this paper we demonstrate how various image transformations, including shearing, rotation and scaling, can be performed directly upon a PPA. The implementation details are presented using the SCAMP-5 vision chip, that contains a 256x256 pixel-parallel array. Our approaches for performing the image transformations efficiently exploit the parallel computation in a cellular processor array, minimizing the number of SIMD instructions required. These fundamental image transformations are vital building blocks for many visual tasks. This paper aims to serve as a reference for future PPA research while demonstrating the flexibility of PPA architectures.

Related papers

Your ViT is Secretly an Image Segmentation Model [50.71238842539735]
Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks. We show that inductive biases introduced by task-specific components can instead be learned by the ViT itself. We introduce the Mask Transformer (EoMT), which repurposes the plain ViT architecture to conduct image segmentation.
arXiv Detail & Related papers (2025-03-24T19:56:02Z)
Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP) Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid. PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory. Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z)
Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z)
Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS) The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture. It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z)
P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications [4.102356304183255]
High-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. We propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution and ReLU. Our results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by 21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML
arXiv Detail & Related papers (2022-03-07T04:15:29Z)
On-Sensor Binarized Fully Convolutional Neural Network with A Pixel Processor Array [17.4097919720973]
This work presents a method to implement fully convolutional neural networks (FCNs) on Pixel Processor Array ( PPA) sensors. We design and train binarized FCN for both binary weights and activations using batchnorm, group convolution, and learnable threshold for binarization. We demonstrate the first implementation of an FCN on a PPA device, performing three convolution layers entirely in the pixel-level processors.
arXiv Detail & Related papers (2022-02-02T01:18:40Z)
Parallel Discrete Convolutions on Adaptive Particle Representations of Images [2.362412515574206]
We present data structures and algorithms for native implementations of discrete convolution operators over Adaptive Particle Representations. The APR is a content-adaptive image representation that locally adapts the sampling resolution to the image signal. We show that APR convolution naturally leads to scale-adaptive algorithms that efficiently parallelize on multi-core CPU and GPU architectures.
arXiv Detail & Related papers (2021-12-07T09:40:05Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
Visual Transformers: Token-based Image Representation and Processing for Computer Vision [67.55770209540306]
Visual Transformer ( VT) operates in a semantic token space, judiciously attending to different image parts based on context. Using an advanced training recipe, our VTs significantly outperform their convolutional counterparts. For semantic segmentation on LIP and COCO-stuff, VT-based feature pyramid networks (FPN) achieve 0.35 points higher mIoU while reducing the FPN module's FLOPs by 6.5x.
arXiv Detail & Related papers (2020-06-05T20:49:49Z)
Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays [16.531637803429277]
We present a novel method of CNN inference for pixel processor array ( PPA) vision sensors. Our approach can perform convolutional layers, max pooling, ReLu, and a final fully connected layer entirely upon the PPA sensor. This is the first work demonstrating CNN inference conducted entirely upon the processor array of a PPA vision sensor device, requiring no external processing.
arXiv Detail & Related papers (2020-04-27T01:00:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.