Related papers: Raw or Cooked? Object Detection on RAW Images

Raw or Cooked? Object Detection on RAW Images

URL: http://arxiv.org/abs/2301.08965v1
Date: Sat, 21 Jan 2023 15:42:53 GMT
Title: Raw or Cooked? Object Detection on RAW Images
Authors: William Ljungbergh, Joakim Johnander, Christoffer Petersson, and Michael Felsberg
Abstract summary: We investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images.
Score: 11.991240159496833
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

Related papers

Beyond RGB: Adaptive Parallel Processing for RAW Object Detection [5.36869872375791]
Raw Adaptation Module (RAM) is a module designed to replace the traditional Image Signal Processing (ISP) Our approach outperforms RGB-based methods and achieves state-of-the-art results across diverse RAW image datasets.
arXiv Detail & Related papers (2025-03-17T13:36:49Z)
Keypoint Detection and Description for Raw Bayer Images [10.443350617606972]
Keypoint detection and local feature description are fundamental tasks in robotic perception, critical for applications such as SLAM, robot localization, feature matching, pose estimation, and 3D mapping. While existing methods predominantly operate on RGB images, we propose a novel network that directly processes raw images, bypassing the need for the Image Signal Processor (ISP). This work represents the first attempt to develop a keypoint detection and feature description network specifically for raw images, offering a more efficient solution for resource-constrained environments.
arXiv Detail & Related papers (2025-03-11T17:54:12Z)
Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR) With this, we propose an automated image evaluation pipeline. We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z)
Source Identification: A Self-Supervision Task for Dense Prediction [8.744460886823322]
We propose a new self-supervision task called source identification (SI) Synthetic images are generated by fusing multiple source images and the network's task is to reconstruct the original images, given the fused images. We validate our method on two medical image segmentation tasks: brain tumor segmentation and white matter hyperintensities segmentation.
arXiv Detail & Related papers (2023-07-05T12:27:58Z)
Learning to Detect Good Keypoints to Match Non-Rigid Objects in RGB Images [7.428474910083337]
We present a novel learned keypoint detection method designed to maximize the number of correct matches for the task of non-rigid image correspondence. Our training framework uses true correspondences, obtained by matching annotated image pairs with a predefined descriptor extractor, as a ground-truth to train a convolutional neural network (CNN) Experiments show that our method outperforms the state-of-the-art keypoint detector on real images of non-rigid objects by 20 p.p. on Mean Matching Accuracy.
arXiv Detail & Related papers (2022-12-13T11:59:09Z)
Visual Radial Basis Q-Network [0.2148535041822524]
We propose a generic method to extract sparse features from raw images with few trainable parameters. We show that the proposed approach provides similar or, in some cases, even better performances with fewer trainable parameters while being conceptually simpler.
arXiv Detail & Related papers (2022-06-14T09:34:34Z)
An Empirical Study of Remote Sensing Pretraining [117.90699699469639]
We conduct an empirical study of remote sensing pretraining (RSP) on aerial images. RSP can help deliver distinctive performances in scene recognition tasks. RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
arXiv Detail & Related papers (2022-04-06T13:38:11Z)
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters. It learns this mapping by training to find the set of best parameters on a set of "seen" tasks. Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z)
Semantic-Aware Generation for Self-Supervised Visual Representation Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image. SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations. We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)
Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features [10.163477961551592]
Cross-modal retrieval is an important functionality in modern search engines. In this paper, we focus on the image-sentence retrieval task. We use the recently introduced TERN architecture as an image-sentence features extractor.
arXiv Detail & Related papers (2021-06-01T10:11:46Z)
Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF. We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials. Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.