Keypoint Detection and Description for Raw Bayer Images
- URL: http://arxiv.org/abs/2503.08673v2
- Date: Sat, 12 Apr 2025 03:11:58 GMT
- Title: Keypoint Detection and Description for Raw Bayer Images
- Authors: Jiakai Lin, Jinchang Zhang, Guoyu Lu,
- Abstract summary: Keypoint detection and local feature description are fundamental tasks in robotic perception, critical for applications such as SLAM, robot localization, feature matching, pose estimation, and 3D mapping.<n>While existing methods predominantly operate on RGB images, we propose a novel network that directly processes raw images, bypassing the need for the Image Signal Processor (ISP).<n>This work represents the first attempt to develop a keypoint detection and feature description network specifically for raw images, offering a more efficient solution for resource-constrained environments.
- Score: 10.443350617606972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keypoint detection and local feature description are fundamental tasks in robotic perception, critical for applications such as SLAM, robot localization, feature matching, pose estimation, and 3D mapping. While existing methods predominantly operate on RGB images, we propose a novel network that directly processes raw images, bypassing the need for the Image Signal Processor (ISP). This approach significantly reduces hardware requirements and memory consumption, which is crucial for robotic vision systems. Our method introduces two custom-designed convolutional kernels capable of performing convolutions directly on raw images, preserving inter-channel information without converting to RGB. Experimental results show that our network outperforms existing algorithms on raw images, achieving higher accuracy and stability under large rotations and scale variations. This work represents the first attempt to develop a keypoint detection and feature description network specifically for raw images, offering a more efficient solution for resource-constrained environments.
Related papers
- Beyond RGB: Adaptive Parallel Processing for RAW Object Detection [5.36869872375791]
Raw Adaptation Module (RAM) is a module designed to replace the traditional Image Signal Processing (ISP)
Our approach outperforms RGB-based methods and achieves state-of-the-art results across diverse RAW image datasets.
arXiv Detail & Related papers (2025-03-17T13:36:49Z) - Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - Modular Anti-noise Deep Learning Network for Robotic Grasp Detection
Based on RGB Images [2.759223695383734]
This paper introduces an interesting approach to detect grasping pose from a single RGB image.
We propose a modular learning network augmented with grasp detection and semantic segmentation.
We demonstrate the feasibility and accuracy of our proposed approach through practical experiments and evaluations.
arXiv Detail & Related papers (2023-10-30T02:01:49Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Raw Image Reconstruction with Learned Compact Metadata [61.62454853089346]
We propose a novel framework to learn a compact representation in the latent space serving as the metadata in an end-to-end manner.
We show how the proposed raw image compression scheme can adaptively allocate more bits to image regions that are important from a global perspective.
arXiv Detail & Related papers (2023-02-25T05:29:45Z) - Enabling ISP-less Low-Power Computer Vision [4.102254385058941]
We release the raw version of a large-scale benchmark for generic high-level vision tasks.
For ISP-less CV systems, training on raw images result in a 7.1% increase in test accuracy.
We propose an energy-efficient form of analog in-pixel demosaicing that may be coupled with in-pixel CNN computations.
arXiv Detail & Related papers (2022-10-11T13:47:30Z) - Model-Based Image Signal Processors via Learnable Dictionaries [6.766416093990318]
Digital cameras transform sensor RAW readings into RGB images by means of their Image Signal Processor (ISP)
Recent approaches have attempted to bridge this gap by estimating the RGB to RAW mapping.
We present a novel hybrid model-based and data-driven ISP that is both learnable and interpretable.
arXiv Detail & Related papers (2022-01-10T08:36:10Z) - TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation
Localization [49.521622399483846]
We propose a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) for generic image manipulation localization.
The proposed TBNet can significantly outperform state-of-the-art generic image manipulation localization methods in terms of both MCC and F1.
arXiv Detail & Related papers (2021-08-10T08:22:05Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.