Object Detection in the DCT Domain: is Luminance the Solution?
- URL: http://arxiv.org/abs/2006.05732v3
- Date: Wed, 14 Jul 2021 08:09:24 GMT
- Title: Object Detection in the DCT Domain: is Luminance the Solution?
- Authors: Benjamin Deguerre, Clement Chatelain, Gilles Gasso
- Abstract summary: This paper proposes to take advantage of the compressed representation of images to carry out object detection usable in constrained resources conditions.
This leads to a $times 1.7$ speed up in comparison with a standard RGB-based architecture, while only reducing the detection performance by 5.5%.
- Score: 4.361526134899725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection in images has reached unprecedented performances. The
state-of-the-art methods rely on deep architectures that extract salient
features and predict bounding boxes enclosing the objects of interest. These
methods essentially run on RGB images. However, the RGB images are often
compressed by the acquisition devices for storage purpose and transfer
efficiency. Hence, their decompression is required for object detectors. To
gain in efficiency, this paper proposes to take advantage of the compressed
representation of images to carry out object detection usable in constrained
resources conditions.
Specifically, we focus on JPEG images and propose a thorough analysis of
detection architectures newly designed in regard of the peculiarities of the
JPEG norm. This leads to a $\times 1.7$ speed up in comparison with a standard
RGB-based architecture, while only reducing the detection performance by 5.5%.
Additionally, our empirical findings demonstrate that only part of the
compressed JPEG information, namely the luminance component, may be required to
match detection accuracy of the full input methods.
Related papers
- Modular Anti-noise Deep Learning Network for Robotic Grasp Detection
Based on RGB Images [2.759223695383734]
This paper introduces an interesting approach to detect grasping pose from a single RGB image.
We propose a modular learning network augmented with grasp detection and semantic segmentation.
We demonstrate the feasibility and accuracy of our proposed approach through practical experiments and evaluations.
arXiv Detail & Related papers (2023-10-30T02:01:49Z) - HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information [12.376615603048279]
HalluciDet is an IR-RGB image translation model for object detection.
We empirically compare our approach against state-of-the-art methods for image translation and for fine-tuning on IR.
arXiv Detail & Related papers (2023-10-07T03:00:33Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Raw Image Reconstruction with Learned Compact Metadata [61.62454853089346]
We propose a novel framework to learn a compact representation in the latent space serving as the metadata in an end-to-end manner.
We show how the proposed raw image compression scheme can adaptively allocate more bits to image regions that are important from a global perspective.
arXiv Detail & Related papers (2023-02-25T05:29:45Z) - GenISP: Neural ISP for Low-Light Machine Cognition [19.444297600977546]
In low-light conditions, object detectors using raw image data are more robust than detectors using image data processed by an ISP pipeline.
We propose a minimal neural ISP pipeline for machine cognition, named GenISP, that explicitly incorporates Color Space Transformation to a device-independent color space.
arXiv Detail & Related papers (2022-05-07T17:17:24Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - Deep Learning Based Image Retrieval in the JPEG Compressed Domain [0.0]
We propose a unified model for image retrieval which takes DCT coefficients as input and efficiently extracts global and local features directly in the JPEG compressed domain for accurate image retrieval.
Our proposed model performed similarly to the current DELG model which takes RGB features as an input with reference to mean average precision.
arXiv Detail & Related papers (2021-07-08T07:30:03Z) - Cascade Graph Neural Networks for RGB-D Salient Object Detection [41.57218490671026]
We study the problem of salient object detection (SOD) for RGB-D images using both color and depth information.
We introduce Cascade Graph Neural Networks(Cas-Gnn),a unified framework which is capable of comprehensively distilling and reasoning the mutual benefits between these two data sources.
Cas-Gnn achieves significantly better performance than all existing RGB-DSOD approaches on several widely-used benchmarks.
arXiv Detail & Related papers (2020-08-07T10:59:04Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.