3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification
- URL: http://arxiv.org/abs/2408.13728v1
- Date: Sun, 25 Aug 2024 05:41:47 GMT
- Title: 3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification
- Authors: Haizhao Jing, Liuwei Wan, Xizhe Xue, Haokui Zhang, Ying Li,
- Abstract summary: We propose a 3D relational ConvNet named 3D-RCNet, which inherits both strengths of ConvNet and ViT.
The proposed 3D-RCNet maintains the high computational efficiency of ConvNet while enjoying the flexibility of ViT.
Empirical evaluations on three representative benchmark HSI datasets show that the proposed model outperforms previous ConvNet-based and ViT-based HSI approaches.
- Score: 8.124761584272132
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the Vision Transformer (ViT) model has replaced the classical Convolutional Neural Network (ConvNet) in various computer vision tasks due to its superior performance. Even in hyperspectral image (HSI) classification field, ViT-based methods also show promising potential. Nevertheless, ViT encounters notable difficulties in processing HSI data. Its self-attention mechanism, which exhibits quadratic complexity, escalates computational costs. Additionally, ViT's substantial demand for training samples does not align with the practical constraints posed by the expensive labeling of HSI data. To overcome these challenges, we propose a 3D relational ConvNet named 3D-RCNet, which inherits both strengths of ConvNet and ViT, resulting in high performance in HSI classification. We embed the self-attention mechanism of Transformer into the convolutional operation of ConvNet to design 3D relational convolutional operation and use it to build the final 3D-RCNet. The proposed 3D-RCNet maintains the high computational efficiency of ConvNet while enjoying the flexibility of ViT. Additionally, the proposed 3D relational convolutional operation is a plug-and-play operation, which can be inserted into previous ConvNet-based HSI classification methods seamlessly. Empirical evaluations on three representative benchmark HSI datasets show that the proposed model outperforms previous ConvNet-based and ViT-based HSI approaches.
Related papers
- MeshConv3D: Efficient convolution and pooling operators for triangular 3D meshes [0.0]
MeshConv3D is a 3D mesh-dedicated methodology integrating specialized convolution and face collapse-based pooling operators.
The experimental results obtained on three distinct benchmark datasets show that the proposed approach makes it possible to achieve equivalent or superior classification results.
arXiv Detail & Related papers (2025-01-07T14:41:26Z) - Fast Occupancy Network [15.759329665907229]
Occupancy Network predicts category of voxel in specified 3D space around ego vehicle.
We present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature.
We also present an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost.
arXiv Detail & Related papers (2024-12-10T03:46:03Z) - Heuristical Comparison of Vision Transformers Against Convolutional Neural Networks for Semantic Segmentation on Remote Sensing Imagery [0.0]
Vision Transformers (ViT) have brought a new wave of research in the field of computer vision.
This paper focuses on the comparison of three key factors of using (or not using) ViT for semantic segmentation of aerial images.
We show that the novel combined weighted loss function significantly boosts the CNN model's performance compared to transfer learning with ViT.
arXiv Detail & Related papers (2024-11-14T00:18:04Z) - 3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification [12.729885732069926]
Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs)
ViTs excel with sequential data, but they cannot extract spectral-spatial information like CNNs.
We propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification.
arXiv Detail & Related papers (2024-04-20T03:39:54Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Large Generative Model Assisted 3D Semantic Communication [51.17527319441436]
We propose a Generative AI Model assisted 3D SC (GAM-3DSC) system.
First, we introduce a 3D Semantic Extractor (3DSE) to extract key semantics from a 3D scenario based on user requirements.
We then present an Adaptive Semantic Compression Model (ASCM) for encoding these multi-perspective images.
Finally, we design a conditional Generative adversarial network and Diffusion model aided-Channel Estimation (GDCE) to estimate and refine the Channel State Information (CSI) of physical channels.
arXiv Detail & Related papers (2024-03-09T03:33:07Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - VidConv: A modernized 2D ConvNet for Efficient Video Recognition [0.8070014188337304]
Vision Transformers (ViT) have been steadily breaking the record for many vision tasks.
ViTs are generally computational, memory-consuming, and unfriendly for embedded devices.
In this paper, we adopt the modernized structure of ConvNet to design a new backbone for action recognition.
arXiv Detail & Related papers (2022-07-08T09:33:46Z) - Improving 3D Object Detection with Channel-wise Transformer [58.668922561622466]
We propose a two-stage 3D object detection framework (CT3D) with minimal hand-crafted design.
CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation.
It achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark.
arXiv Detail & Related papers (2021-08-23T02:03:40Z) - A New Backbone for Hyperspectral Image Reconstruction [90.48427561874402]
3D hyperspectral image (HSI) reconstruction refers to inverse process of snapshot compressive imaging.
Proposal is for a Spatial/Spectral Invariant Residual U-Net, namely SSI-ResU-Net.
We show that SSI-ResU-Net achieves competing performance with over 77.3% reduction in terms of floating-point operations.
arXiv Detail & Related papers (2021-08-17T16:20:51Z) - Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer
Learning [67.40866334083941]
We propose an end-to-end 3-D lightweight convolutional neural network (CNN) for limited samples-based HSI classification.
Compared with conventional 3-D-CNN models, the proposed 3-D-LWNet has a deeper network structure, less parameters, and lower computation cost.
Our model achieves competitive performance for HSI classification compared to several state-of-the-art methods.
arXiv Detail & Related papers (2020-12-07T03:44:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.