Raw or Cooked? Object Detection on RAW Images
- URL: http://arxiv.org/abs/2301.08965v1
- Date: Sat, 21 Jan 2023 15:42:53 GMT
- Title: Raw or Cooked? Object Detection on RAW Images
- Authors: William Ljungbergh, Joakim Johnander, Christoffer Petersson, and
Michael Felsberg
- Abstract summary: We investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks.
We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training.
We propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images.
- Score: 11.991240159496833
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Images fed to a deep neural network have in general undergone several
handcrafted image signal processing (ISP) operations, all of which have been
optimized to produce visually pleasing images. In this work, we investigate the
hypothesis that the intermediate representation of visually pleasing images is
sub-optimal for downstream computer vision tasks compared to the RAW image
representation. We suggest that the operations of the ISP instead should be
optimized towards the end task, by learning the parameters of the operations
jointly during training. We extend previous works on this topic and propose a
new learnable operation that enables an object detector to achieve superior
performance when compared to both previous works and traditional RGB images. In
experiments on the open PASCALRAW dataset, we empirically confirm our
hypothesis.
Related papers
- In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation [44.26537443476901]
We propose In-Context Translation (ICT) to unify visual recognition (e.g., semantic segmentation), low-level image processing (e.g., denoising), and conditional image generation (e.g., edge-to-image synthesis)
ICT standardizes the training of different tasks into a general in-context learning, where "in-context" means the input comprises an example input-output pair of the target task and a query image.
In experiments, ICT unifies ten vision tasks and showcases impressive performance on their respective benchmarks.
arXiv Detail & Related papers (2024-04-15T10:05:36Z) - Source Identification: A Self-Supervision Task for Dense Prediction [8.744460886823322]
We propose a new self-supervision task called source identification (SI)
Synthetic images are generated by fusing multiple source images and the network's task is to reconstruct the original images, given the fused images.
We validate our method on two medical image segmentation tasks: brain tumor segmentation and white matter hyperintensities segmentation.
arXiv Detail & Related papers (2023-07-05T12:27:58Z) - Visual Radial Basis Q-Network [0.2148535041822524]
We propose a generic method to extract sparse features from raw images with few trainable parameters.
We show that the proposed approach provides similar or, in some cases, even better performances with fewer trainable parameters while being conceptually simpler.
arXiv Detail & Related papers (2022-06-14T09:34:34Z) - An Empirical Study of Remote Sensing Pretraining [117.90699699469639]
We conduct an empirical study of remote sensing pretraining (RSP) on aerial images.
RSP can help deliver distinctive performances in scene recognition tasks.
RSP mitigates the data discrepancies of traditional ImageNet pretraining on RS images, but it may still suffer from task discrepancies.
arXiv Detail & Related papers (2022-04-06T13:38:11Z) - On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision.
We present an in-depth study of image pre-training.
We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z) - Task2Sim : Towards Effective Pre-training and Transfer from Synthetic
Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks.
We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters.
It learns this mapping by training to find the set of best parameters on a set of "seen" tasks.
Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Towards Efficient Cross-Modal Visual Textual Retrieval using
Transformer-Encoder Deep Features [10.163477961551592]
Cross-modal retrieval is an important functionality in modern search engines.
In this paper, we focus on the image-sentence retrieval task.
We use the recently introduced TERN architecture as an image-sentence features extractor.
arXiv Detail & Related papers (2021-06-01T10:11:46Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.