SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
- URL: http://arxiv.org/abs/2405.10148v2
- Date: Thu, 02 Jan 2025 03:20:40 GMT
- Title: SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
- Authors: Zhaoxu Li, Wei An, Gaowei Guo, Longguang Wang, Yingqian Wang, Zaiping Lin,
- Abstract summary: Hyperspectral target detection (HTD) aims to identify specific materials based on spectral information in hyperspectral imagery.
Existing HTD methods are developed based on per-pixel binary classification, which limits the feature representation capability for instance-level objects.
We propose the first specialized network for hyperspectral multi-class point object detection, SpecDETR.
- Score: 32.7318504162588
- License:
- Abstract: Hyperspectral target detection (HTD) aims to identify specific materials based on spectral information in hyperspectral imagery and can detect extremely small objects, some of which occupy a smaller than one-pixel area. However, existing HTD methods are developed based on per-pixel binary classification, which limits the feature representation capability for instance-level objects. In this paper, we rethink the hyperspectral target detection from the point object detection perspective, and propose the first specialized network for hyperspectral multi-class point object detection, SpecDETR. Without the visual foundation model of the current object detection framework, SpecDETR treats each pixel in input images as a token and uses a multi-layer Transformer encoder with self-excited subpixel-scale attention modules to directly extract joint spatial-spectral features from images. During feature extraction, we introduce a self-excited mechanism to enhance object features through self-excited amplification, thereby accelerating network convergence. Additionally, SpecDETR regards point object detection as a one-to-many set prediction problem, thereby achieving a concise and efficient DETR decoder that surpasses the state-of-the-art (SOTA) DETR decoder. We develop a simulated hyperSpectral Point Object Detection benchmark termed SPOD, and for the first time, evaluate and compare the performance of current object detection networks and HTD methods on hyperspectral point object detection. Extensive experiments demonstrate that our proposed SpecDETR outperforms SOTA object detection networks and HTD methods. Our code and dataset are available at https://github.com/ZhaoxuLi123/SpecDETR.
Related papers
- Learning Remote Sensing Object Detection with Single Point Supervision [17.12725535531483]
Pointly Supervised Object Detection (PSOD) has attracted considerable interests due to its lower labeling cost as compared to box-level supervised object detection.
We make the first attempt to achieve RS object detection with single point supervision, and propose a PSOD method tailored for RS images.
Our method can achieve significantly better performance as compared to state-of-the-art image-level and point-level supervised detection methods, and reduce the performance gap between PSOD and box-level supervised object detection.
arXiv Detail & Related papers (2023-05-23T15:06:04Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Fast Fourier Convolution Based Remote Sensor Image Object Detection for
Earth Observation [0.0]
We propose a Frequency-aware Feature Pyramid Framework (FFPF) for remote sensing object detection.
F-ResNet is proposed to perceive the spectral context information by plugging the frequency domain convolution into each stage of the backbone.
The BSFPN is designed to use a bilateral sampling strategy and skipping connection to better model the association of object features at different scales.
arXiv Detail & Related papers (2022-09-01T15:50:58Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - Recent Trends in 2D Object Detection and Applications in Video Event
Recognition [0.76146285961466]
We discuss the pioneering works in object detection, followed by the recent breakthroughs that employ deep learning.
We highlight recent datasets for 2D object detection both in images and videos, and present a comparative performance summary of various state-of-the-art object detection techniques.
arXiv Detail & Related papers (2022-02-07T14:15:11Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - CFC-Net: A Critical Feature Capturing Network for Arbitrary-Oriented
Object Detection in Remote Sensing Images [0.9462808515258465]
In this paper, we discuss the role of discriminative features in object detection.
We then propose a Critical Feature Capturing Network (CFC-Net) to improve detection accuracy.
We show that our method achieves superior detection performance compared with many state-of-the-art approaches.
arXiv Detail & Related papers (2021-01-18T02:31:09Z) - LiDAR-based Online 3D Video Object Detection with Graph-based Message
Passing and Spatiotemporal Transformer Attention [100.52873557168637]
3D object detectors usually focus on the single-frame detection, while ignoring the information in consecutive point cloud frames.
In this paper, we propose an end-to-end online 3D video object detector that operates on point sequences.
arXiv Detail & Related papers (2020-04-03T06:06:52Z) - Plug & Play Convolutional Regression Tracker for Video Object Detection [37.47222104272429]
Video object detection targets to simultaneously localize the bounding boxes of the objects and identify their classes in a given video.
One challenge for video object detection is to consistently detect all objects across the whole video.
We propose a Plug & Play scale-adaptive convolutional regression tracker for the video object detection task.
arXiv Detail & Related papers (2020-03-02T15:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.