Related papers: 3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification

3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification

URL: http://arxiv.org/abs/2408.13728v1
Date: Sun, 25 Aug 2024 05:41:47 GMT
Title: 3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification
Authors: Haizhao Jing, Liuwei Wan, Xizhe Xue, Haokui Zhang, Ying Li,
Abstract summary: We propose a 3D relational ConvNet named 3D-RCNet, which inherits both strengths of ConvNet and ViT. The proposed 3D-RCNet maintains the high computational efficiency of ConvNet while enjoying the flexibility of ViT. Empirical evaluations on three representative benchmark HSI datasets show that the proposed model outperforms previous ConvNet-based and ViT-based HSI approaches.
Score: 8.124761584272132
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the Vision Transformer (ViT) model has replaced the classical Convolutional Neural Network (ConvNet) in various computer vision tasks due to its superior performance. Even in hyperspectral image (HSI) classification field, ViT-based methods also show promising potential. Nevertheless, ViT encounters notable difficulties in processing HSI data. Its self-attention mechanism, which exhibits quadratic complexity, escalates computational costs. Additionally, ViT's substantial demand for training samples does not align with the practical constraints posed by the expensive labeling of HSI data. To overcome these challenges, we propose a 3D relational ConvNet named 3D-RCNet, which inherits both strengths of ConvNet and ViT, resulting in high performance in HSI classification. We embed the self-attention mechanism of Transformer into the convolutional operation of ConvNet to design 3D relational convolutional operation and use it to build the final 3D-RCNet. The proposed 3D-RCNet maintains the high computational efficiency of ConvNet while enjoying the flexibility of ViT. Additionally, the proposed 3D relational convolutional operation is a plug-and-play operation, which can be inserted into previous ConvNet-based HSI classification methods seamlessly. Empirical evaluations on three representative benchmark HSI datasets show that the proposed model outperforms previous ConvNet-based and ViT-based HSI approaches.

Related papers

BHViT: Binarized Hybrid Vision Transformer [53.38894971164072]
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN) We propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations. Our proposed algorithm achieves SOTA performance among binary ViT methods.
arXiv Detail & Related papers (2025-03-04T08:35:01Z)
MeshConv3D: Efficient convolution and pooling operators for triangular 3D meshes [0.0]
MeshConv3D is a 3D mesh-dedicated methodology integrating specialized convolution and face collapse-based pooling operators. The experimental results obtained on three distinct benchmark datasets show that the proposed approach makes it possible to achieve equivalent or superior classification results.
arXiv Detail & Related papers (2025-01-07T14:41:26Z)
Fast Occupancy Network [15.759329665907229]
Occupancy Network predicts category of voxel in specified 3D space around ego vehicle. We present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature. We also present an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost.
arXiv Detail & Related papers (2024-12-10T03:46:03Z)
Heuristical Comparison of Vision Transformers Against Convolutional Neural Networks for Semantic Segmentation on Remote Sensing Imagery [0.0]
Vision Transformers (ViT) have recently brought a new wave of research in the field of computer vision. This paper focuses on the comparison of three key factors of using (or not using) ViT for semantic segmentation of remote sensing aerial images on the iSAID.
arXiv Detail & Related papers (2024-11-14T00:18:04Z)
3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification [12.729885732069926]
Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs) ViTs excel with sequential data, but they cannot extract spectral-spatial information like CNNs. We propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification.
arXiv Detail & Related papers (2024-04-20T03:39:54Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Large Generative Model Assisted 3D Semantic Communication [51.17527319441436]
We propose a Generative AI Model assisted 3D SC (GAM-3DSC) system. First, we introduce a 3D Semantic Extractor (3DSE) to extract key semantics from a 3D scenario based on user requirements. We then present an Adaptive Semantic Compression Model (ASCM) for encoding these multi-perspective images. Finally, we design a conditional Generative adversarial network and Diffusion model aided-Channel Estimation (GDCE) to estimate and refine the Channel State Information (CSI) of physical channels.
arXiv Detail & Related papers (2024-03-09T03:33:07Z)
Spatial-Spectral Hyperspectral Classification based on Learnable 3D Group Convolution [18.644268589334217]
This paper proposes a learnable group convolution network (LGCNet) based on an improved 3D-DenseNet model and a lightweight model design. The LGCNet module improves the shortcomings of group convolution by introducing a dynamic learning method for the input channels and convolution kernel grouping. LGCNet has achieved progress in inference speed and accuracy, and outperforms mainstream hyperspectral image classification methods on the Indian Pines, Pavia University, and KSC datasets.
arXiv Detail & Related papers (2023-07-15T05:47:12Z)
MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes [10.667492516216887]
We propose a transformer-based method for semantic segmentation of 3D mesh. We perform positional encoding by means of the Laplacian eigenvectors of the adjacency matrix. We show how the proposed approach yields state-of-the-art performance on semantic segmentation of 3D meshes.
arXiv Detail & Related papers (2023-07-03T15:45:14Z)
SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures. The proposed approach can be applied to general backbones like PointNet and DGCNN. Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z)
VidConv: A modernized 2D ConvNet for Efficient Video Recognition [0.8070014188337304]
Vision Transformers (ViT) have been steadily breaking the record for many vision tasks. ViTs are generally computational, memory-consuming, and unfriendly for embedded devices. In this paper, we adopt the modernized structure of ConvNet to design a new backbone for action recognition.
arXiv Detail & Related papers (2022-07-08T09:33:46Z)
Improving 3D Object Detection with Channel-wise Transformer [58.668922561622466]
We propose a two-stage 3D object detection framework (CT3D) with minimal hand-crafted design. CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation. It achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark.
arXiv Detail & Related papers (2021-08-23T02:03:40Z)
A New Backbone for Hyperspectral Image Reconstruction [90.48427561874402]
3D hyperspectral image (HSI) reconstruction refers to inverse process of snapshot compressive imaging. Proposal is for a Spatial/Spectral Invariant Residual U-Net, namely SSI-ResU-Net. We show that SSI-ResU-Net achieves competing performance with over 77.3% reduction in terms of floating-point operations.
arXiv Detail & Related papers (2021-08-17T16:20:51Z)
Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning [67.40866334083941]
We propose an end-to-end 3-D lightweight convolutional neural network (CNN) for limited samples-based HSI classification. Compared with conventional 3-D-CNN models, the proposed 3-D-LWNet has a deeper network structure, less parameters, and lower computation cost. Our model achieves competitive performance for HSI classification compared to several state-of-the-art methods.
arXiv Detail & Related papers (2020-12-07T03:44:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.