Related papers: HiRISE: High-Resolution Image Scaling for Edge ML via In-Sensor Compression and Selective ROI

HiRISE: High-Resolution Image Scaling for Edge ML via In-Sensor Compression and Selective ROI

URL: http://arxiv.org/abs/2408.03956v1
Date: Tue, 23 Jul 2024 16:26:05 GMT
Title: HiRISE: High-Resolution Image Scaling for Edge ML via In-Sensor Compression and Selective ROI
Authors: Brendan Reidy, Sepehr Tabrizchi, Mohamadreza Mohammadi, Shaahin Angizi, Arman Roohi, Ramtin Zand,
Abstract summary: We propose a high-resolution image scaling system for edge machine learning (ML) called HiRISE. Our methodology achieves up to 17.7x reduction in data transfer and energy consumption.
Score: 1.3757956340051605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rise of tiny IoT devices powered by machine learning (ML), many researchers have directed their focus toward compressing models to fit on tiny edge devices. Recent works have achieved remarkable success in compressing ML models for object detection and image classification on microcontrollers with small memory, e.g., 512kB SRAM. However, there remain many challenges prohibiting the deployment of ML systems that require high-resolution images. Due to fundamental limits in memory capacity for tiny IoT devices, it may be physically impossible to store large images without external hardware. To this end, we propose a high-resolution image scaling system for edge ML, called HiRISE, which is equipped with selective region-of-interest (ROI) capability leveraging analog in-sensor image scaling. Our methodology not only significantly reduces the peak memory requirements, but also achieves up to 17.7x reduction in data transfer and energy consumption.

Related papers

Ultra-Low Complexity On-Orbit Compression for Remote Sensing Imagery via Block Modulated Imaging [17.334800411037836]
This paper advances the study of compressed sensing in remote sensing image compression. By requiring only a single exposure, Block Modulated Imaging (BMI) significantly enhances imaging acquisition speeds. We propose a novel decoding network specifically designed to reconstruct images compressed under the BMI framework.
arXiv Detail & Related papers (2024-12-24T13:18:00Z)
ZoomLDM: Latent Diffusion Model for multi-scale image generation [57.639937071834986]
We present ZoomLDM, a diffusion model tailored for generating images across multiple scales. Central to our approach is a novel magnification-aware conditioning mechanism that utilizes self-supervised learning (SSL) embeddings. ZoomLDM achieves state-of-the-art image generation quality across all scales, excelling in the data-scarce setting of generating thumbnails of entire large images.
arXiv Detail & Related papers (2024-11-25T22:39:22Z)
LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention. For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z)
Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference [7.624476059109304]
Mondrian is an edge system that enables high-performance object detection on high-resolution video streams. We devise a novel Compressive Packed Inference to minimize per-pixel processing costs.
arXiv Detail & Related papers (2024-03-12T12:35:12Z)
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory. Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z)
Memory-Constrained Semantic Segmentation for Ultra-High Resolution UAV Imagery [35.96063342025938]
This paper explores the intricate problem of achieving efficient and effective segmentation of ultra-high resolution UAV imagery. We propose a GPU memory-efficient and effective framework for local inference without accessing the context beyond local patches. We present an efficient memory-based interaction scheme to correct potential semantic bias of the underlying high-resolution information.
arXiv Detail & Related papers (2023-10-07T07:44:59Z)
MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning [114.66037224769005]
We present a novel MicroISP model designed specifically for edge devices. The proposed solution is capable of processing up to 32MP photos on recent smartphones using the standard mobile ML libraries. The architecture of the model is flexible, allowing to adjust its complexity to devices of different computational power.
arXiv Detail & Related papers (2022-11-08T17:40:50Z)
Iterative Patch Selection for High-Resolution Image Recognition [10.847032625429717]
We propose a simple method, Iterative Patch Selection (IPS), which decouples the memory usage from the input size. IPS achieves this by selecting only the most salient patches, which are then aggregated into a global representation for image recognition. Our method demonstrates strong performance and has wide applicability across different domains, training regimes and image sizes while using minimal accelerator memory.
arXiv Detail & Related papers (2022-10-24T07:55:57Z)
Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR Applications [5.529817156718514]
Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration. For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks. The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated.
arXiv Detail & Related papers (2022-06-08T11:18:02Z)
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle. We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z)
FOVEA: Foveated Image Magnification for Autonomous Navigation [53.69803081925454]
We propose an attentional approach that elastically magnifies certain regions while maintaining a small input canvas. Our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning. On the autonomous driving datasets Argoverse-HD and BDD100K, we show our proposed method boosts the detection AP over standard Faster R-CNN, with and without finetuning.
arXiv Detail & Related papers (2021-08-27T03:07:55Z)
Asymmetric CNN for image super-resolution [102.96131810686231]
Deep convolutional neural networks (CNNs) have been widely applied for low-level vision over the past five years. We propose an asymmetric CNN (ACNet) comprising an asymmetric block (AB), a mem?ory enhancement block (MEB) and a high-frequency feature enhancement block (HFFEB) for image super-resolution. Our ACNet can effectively address single image super-resolution (SISR), blind SISR and blind SISR of blind noise problems.
arXiv Detail & Related papers (2021-03-25T07:10:46Z)
A Machine Learning Imaging Core using Separable FIR-IIR Filters [2.099922236065961]
We use a fully trainable, fixed-topology neural network to build a model that can perform a wide variety of image processing tasks. Our proposed Machine Learning Imaging Core, dubbed MagIC, uses a silicon area of 3mm2. Each MagIC core consumes 56mW (215 mW max power) at 500MHz and achieves an energy-efficient throughput of 23TOPS/W/mm2.
arXiv Detail & Related papers (2020-01-02T21:24:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.