Enabling ISP-less Low-Power Computer Vision
- URL: http://arxiv.org/abs/2210.05451v1
- Date: Tue, 11 Oct 2022 13:47:30 GMT
- Title: Enabling ISP-less Low-Power Computer Vision
- Authors: Gourav Datta, Zeyu Liu, Zihan Yin, Linyu Sun, Akhilesh R. Jaiswal,
Peter A. Beerel
- Abstract summary: We release the raw version of a large-scale benchmark for generic high-level vision tasks.
For ISP-less CV systems, training on raw images result in a 7.1% increase in test accuracy.
We propose an energy-efficient form of analog in-pixel demosaicing that may be coupled with in-pixel CNN computations.
- Score: 4.102254385058941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to deploy current computer vision (CV) models on
resource-constrained low-power devices, recent works have proposed in-sensor
and in-pixel computing approaches that try to partly/fully bypass the image
signal processor (ISP) and yield significant bandwidth reduction between the
image sensor and the CV processing unit by downsampling the activation maps in
the initial convolutional neural network (CNN) layers. However, direct
inference on the raw images degrades the test accuracy due to the difference in
covariance of the raw images captured by the image sensors compared to the
ISP-processed images used for training. Moreover, it is difficult to train deep
CV models on raw images, because most (if not all) large-scale open-source
datasets consist of RGB images. To mitigate this concern, we propose to invert
the ISP pipeline, which can convert the RGB images of any dataset to its raw
counterparts, and enable model training on raw images. We release the raw
version of the COCO dataset, a large-scale benchmark for generic high-level
vision tasks. For ISP-less CV systems, training on these raw images result in a
7.1% increase in test accuracy on the visual wake works (VWW) dataset compared
to relying on training with traditional ISP-processed RGB datasets. To further
improve the accuracy of ISP-less CV models and to increase the energy and
bandwidth benefits obtained by in-sensor/in-pixel computing, we propose an
energy-efficient form of analog in-pixel demosaicing that may be coupled with
in-pixel CNN computations. When evaluated on raw images captured by real
sensors from the PASCALRAW dataset, our approach results in a 8.1% increase in
mAP. Lastly, we demonstrate a further 20.5% increase in mAP by using a novel
application of few-shot learning with thirty shots each for the novel PASCALRAW
dataset, constituting 3 classes.
Related papers
- Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images [51.68432586065828]
We introduce RAW-Adapter, a novel approach aimed at adapting sRGB pre-trained models to camera RAW data.
Raw-Adapter comprises input-level adapters that employ learnable ISP stages to adjust RAW inputs, as well as model-level adapters to build connections between ISP stages and subsequent high-level networks.
arXiv Detail & Related papers (2024-08-27T06:14:54Z) - Dual-Scale Transformer for Large-Scale Single-Pixel Imaging [11.064806978728457]
We propose a deep unfolding network with hybrid-attention Transformer on Kronecker SPI model, dubbed HATNet, to improve the imaging quality of real SPI cameras.
The gradient descent module can avoid high computational overheads rooted in previous gradient descent modules based on vectorized SPI.
The denoising module is an encoder-decoder architecture powered by dual-scale spatial attention for high- and low-frequency aggregation and channel attention for global information recalibration.
arXiv Detail & Related papers (2024-04-07T15:53:21Z) - Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and
Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution.
We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms.
As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z) - Reversed Image Signal Processing and RAW Reconstruction. AIM 2022
Challenge Report [109.2135194765743]
This paper introduces the AIM 2022 Challenge on Reversed Image Signal Processing and RAW Reconstruction.
We aim to recover raw sensor images from the corresponding RGBs without metadata and, by doing this, "reverse" the ISP transformation.
arXiv Detail & Related papers (2022-10-20T10:43:53Z) - LW-ISP: A Lightweight Model with ISP and Deep Learning [17.972611191715888]
We show the possibility of learning-based method to achieve real-time high-performance processing in the ISP pipeline.
We propose LW-ISP, a novel architecture designed to implicitly learn the image mapping from RAW data to RGB image.
Experiments demonstrate that LW-ISP has achieved a 0.38 dB improvement in PSNR compared to the previous best method.
arXiv Detail & Related papers (2022-10-08T04:00:03Z) - GenISP: Neural ISP for Low-Light Machine Cognition [19.444297600977546]
In low-light conditions, object detectors using raw image data are more robust than detectors using image data processed by an ISP pipeline.
We propose a minimal neural ISP pipeline for machine cognition, named GenISP, that explicitly incorporates Color Space Transformation to a device-independent color space.
arXiv Detail & Related papers (2022-05-07T17:17:24Z) - An Empirical Study of Remote Sensing Pretraining [117.90699699469639]
We conduct an empirical study of remote sensing pretraining (RSP) on aerial images.
RSP can help deliver distinctive performances in scene recognition tasks.
RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
arXiv Detail & Related papers (2022-04-06T13:38:11Z) - Toward Efficient Hyperspectral Image Processing inside Camera Pixels [1.6449390849183356]
Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands.
To mitigate this problem, we propose a form of processing-in-pixel (PIP)
Our PIP-optimized custom CNN layers effectively compress the input data, significantly reducing the bandwidth required to transmit the data downstream to the HSI processing unit.
arXiv Detail & Related papers (2022-03-11T01:06:02Z) - Model-Based Image Signal Processors via Learnable Dictionaries [6.766416093990318]
Digital cameras transform sensor RAW readings into RGB images by means of their Image Signal Processor (ISP)
Recent approaches have attempted to bridge this gap by estimating the RGB to RAW mapping.
We present a novel hybrid model-based and data-driven ISP that is both learnable and interpretable.
arXiv Detail & Related papers (2022-01-10T08:36:10Z) - CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade.
CNNs are capable of learning robust representations of the data directly from the RGB pixels.
Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.