Related papers: Task-Aware Image Signal Processor for Advanced Visual Perception

Task-Aware Image Signal Processor for Advanced Visual Perception

URL: http://arxiv.org/abs/2509.13762v1
Date: Wed, 17 Sep 2025 07:16:51 GMT
Title: Task-Aware Image Signal Processor for Advanced Visual Perception
Authors: Kai Chen, Jin Xiao, Leheng Zhang, Kexuan Shi, Shuhang Gu,
Abstract summary: Task-Aware Image Signal Processing (TA-ISP) is a compact RAW-to-RGB framework that produces task-oriented representations for pretrained vision models.<n>TA-ISP consistently improves downstream accuracy while markedly reducing parameter count and inference time.<n>It is well suited for deployment on resource-constrained devices.
Score: 32.29324101518987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, there has been a growing trend in computer vision towards exploiting RAW sensor data, which preserves richer information compared to conventional low-bit RGB images. Early studies mainly focused on enhancing visual quality, while more recent efforts aim to leverage the abundant information in RAW data to improve the performance of visual perception tasks such as object detection and segmentation. However, existing approaches still face two key limitations: large-scale ISP networks impose heavy computational overhead, while methods based on tuning traditional ISP pipelines are restricted by limited representational capacity.To address these issues, we propose Task-Aware Image Signal Processing (TA-ISP), a compact RAW-to-RGB framework that produces task-oriented representations for pretrained vision models. Instead of heavy dense convolutional pipelines, TA-ISP predicts a small set of lightweight, multi-scale modulation operators that act at global, regional, and pixel scales to reshape image statistics across different spatial extents. This factorized control significantly expands the range of spatially varying transforms that can be represented while keeping memory usage, computation, and latency tightly constrained. Evaluated on several RAW-domain detection and segmentation benchmarks under both daytime and nighttime conditions, TA-ISP consistently improves downstream accuracy while markedly reducing parameter count and inference time, making it well suited for deployment on resource-constrained devices.

Related papers

Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection [22.292648672901066]
Low-light Object detection is crucial for many real-world applications but remains challenging due to degraded image quality.<n>We propose a lightweight and self-adaptive Image Signal Processing (ISP) plugin, Dark-ISP, which directly processes Bayer RAW images in dark environments.<n>Our method outperforms state-of-the-art RGB- and RAW-based detection approaches, achieving superior results with minimal parameters in challenging low-light environments.
arXiv Detail & Related papers (2025-09-11T06:44:43Z)
Beyond RGB: Adaptive Parallel Processing for RAW Object Detection [5.36869872375791]
Raw Adaptation Module (RAM) is a module designed to replace the traditional Image Signal Processing (ISP)<n>Our approach outperforms RGB-based methods and achieves state-of-the-art results across diverse RAW image datasets.
arXiv Detail & Related papers (2025-03-17T13:36:49Z)
Keypoint Detection and Description for Raw Bayer Images [10.443350617606972]
Keypoint detection and local feature description are fundamental tasks in robotic perception, critical for applications such as SLAM, robot localization, feature matching, pose estimation, and 3D mapping.<n>While existing methods predominantly operate on RGB images, we propose a novel network that directly processes raw images, bypassing the need for the Image Signal Processor (ISP).<n>This work represents the first attempt to develop a keypoint detection and feature description network specifically for raw images, offering a more efficient solution for resource-constrained environments.
arXiv Detail & Related papers (2025-03-11T17:54:12Z)
LWGANet: A Lightweight Group Attention Backbone for Remote Sensing Visual Tasks [20.924609707499915]
This article introduces LWGANet, a specialized lightweight backbone network tailored for RS visual tasks.<n>LWGA module, tailored for RS imagery, adeptly harnesses redundant features to extract a wide range of spatial information.<n>The results confirm LWGANet's widespread applicability and its ability to maintain an optimal balance between high performance and low complexity.
arXiv Detail & Related papers (2025-01-17T08:56:17Z)
Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels. They are not widely adopted by general users due to their substantial storage requirements. We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z)
Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z)
Enabling ISP-less Low-Power Computer Vision [4.102254385058941]
We release the raw version of a large-scale benchmark for generic high-level vision tasks. For ISP-less CV systems, training on raw images result in a 7.1% increase in test accuracy. We propose an energy-efficient form of analog in-pixel demosaicing that may be coupled with in-pixel CNN computations.
arXiv Detail & Related papers (2022-10-11T13:47:30Z)
An Empirical Study of Remote Sensing Pretraining [117.90699699469639]
We conduct an empirical study of remote sensing pretraining (RSP) on aerial images. RSP can help deliver distinctive performances in scene recognition tasks. RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
arXiv Detail & Related papers (2022-04-06T13:38:11Z)
Model-Based Image Signal Processors via Learnable Dictionaries [6.766416093990318]
Digital cameras transform sensor RAW readings into RGB images by means of their Image Signal Processor (ISP) Recent approaches have attempted to bridge this gap by estimating the RGB to RAW mapping. We present a novel hybrid model-based and data-driven ISP that is both learnable and interpretable.
arXiv Detail & Related papers (2022-01-10T08:36:10Z)
Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length. It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity. Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.