Raw or Cooked? Object Detection on RAW Images
- URL: http://arxiv.org/abs/2301.08965v1
- Date: Sat, 21 Jan 2023 15:42:53 GMT
- Title: Raw or Cooked? Object Detection on RAW Images
- Authors: William Ljungbergh, Joakim Johnander, Christoffer Petersson, and
Michael Felsberg
- Abstract summary: We investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks.
We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training.
We propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images.
- Score: 11.991240159496833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Images fed to a deep neural network have in general undergone several
handcrafted image signal processing (ISP) operations, all of which have been
optimized to produce visually pleasing images. In this work, we investigate the
hypothesis that the intermediate representation of visually pleasing images is
sub-optimal for downstream computer vision tasks compared to the RAW image
representation. We suggest that the operations of the ISP instead should be
optimized towards the end task, by learning the parameters of the operations
jointly during training. We extend previous works on this topic and propose a
new learnable operation that enables an object detector to achieve superior
performance when compared to both previous works and traditional RGB images. In
experiments on the open PASCALRAW dataset, we empirically confirm our
hypothesis.
Related papers
- Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - Source Identification: A Self-Supervision Task for Dense Prediction [8.744460886823322]
We propose a new self-supervision task called source identification (SI)
Synthetic images are generated by fusing multiple source images and the network's task is to reconstruct the original images, given the fused images.
We validate our method on two medical image segmentation tasks: brain tumor segmentation and white matter hyperintensities segmentation.
arXiv Detail & Related papers (2023-07-05T12:27:58Z) - Learning to Detect Good Keypoints to Match Non-Rigid Objects in RGB
Images [7.428474910083337]
We present a novel learned keypoint detection method designed to maximize the number of correct matches for the task of non-rigid image correspondence.
Our training framework uses true correspondences, obtained by matching annotated image pairs with a predefined descriptor extractor, as a ground-truth to train a convolutional neural network (CNN)
Experiments show that our method outperforms the state-of-the-art keypoint detector on real images of non-rigid objects by 20 p.p. on Mean Matching Accuracy.
arXiv Detail & Related papers (2022-12-13T11:59:09Z) - Visual Radial Basis Q-Network [0.2148535041822524]
We propose a generic method to extract sparse features from raw images with few trainable parameters.
We show that the proposed approach provides similar or, in some cases, even better performances with fewer trainable parameters while being conceptually simpler.
arXiv Detail & Related papers (2022-06-14T09:34:34Z) - An Empirical Study of Remote Sensing Pretraining [117.90699699469639]
We conduct an empirical study of remote sensing pretraining (RSP) on aerial images.
RSP can help deliver distinctive performances in scene recognition tasks.
RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
arXiv Detail & Related papers (2022-04-06T13:38:11Z) - Task2Sim : Towards Effective Pre-training and Transfer from Synthetic
Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks.
We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters.
It learns this mapping by training to find the set of best parameters on a set of "seen" tasks.
Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Towards Efficient Cross-Modal Visual Textual Retrieval using
Transformer-Encoder Deep Features [10.163477961551592]
Cross-modal retrieval is an important functionality in modern search engines.
In this paper, we focus on the image-sentence retrieval task.
We use the recently introduced TERN architecture as an image-sentence features extractor.
arXiv Detail & Related papers (2021-06-01T10:11:46Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.