Related papers: ImageSig: A signature transform for ultra-lightweight image recognition

ImageSig: A signature transform for ultra-lightweight image recognition

URL: http://arxiv.org/abs/2205.06929v1
Date: Fri, 13 May 2022 23:48:32 GMT
Title: ImageSig: A signature transform for ultra-lightweight image recognition
Authors: Mohamed R. Ibrahim and Terry Lyons
Abstract summary: ImageSig is based on computing signatures and does not require a convolutional structure or an attention-based encoder. ImageSig shows unprecedented performance on hardware such as Raspberry Pi and Jetson-nano.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper introduces a new lightweight method for image recognition. ImageSig is based on computing signatures and does not require a convolutional structure or an attention-based encoder. It is striking to the authors that it achieves: a) an accuracy for 64 X 64 RGB images that exceeds many of the state-of-the-art methods and simultaneously b) requires orders of magnitude less FLOPS, power and memory footprint. The pretrained model can be as small as 44.2 KB in size. ImageSig shows unprecedented performance on hardware such as Raspberry Pi and Jetson-nano. ImageSig treats images as streams with multiple channels. These streams are parameterized by spatial directions. We contribute to the functionality of signature and rough path theory to stream-like data and vision tasks on static images beyond temporal streams. With very few parameters and small size models, the key advantage is that one could have many of these "detectors" assembled on the same chip; moreover, the feature acquisition can be performed once and shared between different models of different tasks - further accelerating the process. This contributes to energy efficiency and the advancements of embedded AI at the edge.

Related papers

Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP) Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid. PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder [29.924160271522354]
Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications. Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts. Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results. We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales.
arXiv Detail & Related papers (2024-03-15T12:45:40Z)
MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression. It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z)
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features. We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z)
Efficient Image Captioning for Edge Devices [8.724184244203892]
We propose LightCap, a lightweight image captioner for resource-limited devices. The core design is built on the recent CLIP model for efficient image captioning. With the carefully designed architecture, our model merely contains 40M parameters, saving the model size by more than 75% and the FLOPs by more than 98%.
arXiv Detail & Related papers (2022-12-18T01:56:33Z)
Iterative Patch Selection for High-Resolution Image Recognition [10.847032625429717]
We propose a simple method, Iterative Patch Selection (IPS), which decouples the memory usage from the input size. IPS achieves this by selecting only the most salient patches, which are then aggregated into a global representation for image recognition. Our method demonstrates strong performance and has wide applicability across different domains, training regimes and image sizes while using minimal accelerator memory.
arXiv Detail & Related papers (2022-10-24T07:55:57Z)
TINYCD: A (Not So) Deep Learning Model For Change Detection [68.8204255655161]
The aim of change detection (CD) is to detect changes occurred in the same area by comparing two images of that place taken at different times. Recent developments in the field of deep learning enabled researchers to achieve outstanding performance in this area. We propose a novel model, called TinyCD, demonstrating to be both lightweight and effective.
arXiv Detail & Related papers (2022-07-26T19:28:48Z)
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition. We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution. Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z)
Parallel Discrete Convolutions on Adaptive Particle Representations of Images [2.362412515574206]
We present data structures and algorithms for native implementations of discrete convolution operators over Adaptive Particle Representations. The APR is a content-adaptive image representation that locally adapts the sampling resolution to the image signal. We show that APR convolution naturally leads to scale-adaptive algorithms that efficiently parallelize on multi-core CPU and GPU architectures.
arXiv Detail & Related papers (2021-12-07T09:40:05Z)
Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.