Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models
- URL: http://arxiv.org/abs/2505.12217v1
- Date: Sun, 18 May 2025 03:32:24 GMT
- Title: Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models
- Authors: Aryan Das, Tanishq Rachamalla, Pravendra Singh, Koushik Biswas, Vinay Kumar Verma, Swalpa Kumar Roy,
- Abstract summary: We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications.<n>Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations.<n>This dataset enhances model performance in tasks like classification and feature extraction, providing a valuable resource for advanced remote sensing applications.
- Score: 15.87261767109048
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery. This dataset enhances model performance in tasks like classification and feature extraction, providing a valuable resource for advanced remote sensing applications. HyperCap is constructed from four benchmark datasets and annotated through a hybrid approach combining automated and manual methods to ensure accuracy and consistency. Empirical evaluations using state-of-the-art encoders and diverse fusion techniques demonstrate significant improvements in classification performance. These results underscore the potential of vision-language learning in HSI and position HyperCap as a foundational dataset for future research in the field.
Related papers
- Structural-Spectral Graph Convolution with Evidential Edge Learning for Hyperspectral Image Clustering [59.24638672786966]
Hyperspectral image (HSI) clustering assigns similar pixels to the same class without any annotations.<n>Existing graph neural networks (GNNs) cannot fully exploit the spectral information of the input HSI.<n>We propose a structural-spectral graph convolutional operator (SSGCO) tailored for graph-structured HSI superpixels.
arXiv Detail & Related papers (2025-06-11T16:41:34Z) - SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations [27.391859339238906]
In this paper, we explore the potential of generative diffusion model in synthesizing hyperspectral images with pixel-level annotations.<n>To the best of our knowledge, it is the first work to generate high-dimensional HSIs with annotations.<n>We select two of the most widely used dense prediction tasks: semantic segmentation and change detection, and generate datasets suitable for these tasks.
arXiv Detail & Related papers (2025-02-24T11:13:37Z) - Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - AMBER -- Advanced SegFormer for Multi-Band Image Segmentation: an application to Hyperspectral Imaging [0.0]
This paper introduces AMBER, an advanced SegFormer specifically designed for multi-band image segmentation.<n>AMBER enhances the original SegFormer by incorporating three-dimensional convolutions, custom kernel sizes, and a Funnelizer layer.<n>Our experiments, conducted on three benchmark datasets and on a dataset from the PRISMA satellite, show that AMBER outperforms traditional CNN-based methods in terms of Overall Accuracy, Kappa coefficient, and Average Accuracy.
arXiv Detail & Related papers (2024-09-14T09:34:05Z) - Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation [74.65906322148997]
We introduce a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features.
Hyper-YOLO significantly outperforms the advanced YOLOv8-N and YOLOv9T with 12% $textval$ and 9% $APMoonLab improvements.
arXiv Detail & Related papers (2024-08-09T01:21:15Z) - HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model [88.13261547704444]
Hyper SIGMA is a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes.<n>In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images.
arXiv Detail & Related papers (2024-06-17T13:22:58Z) - DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - Hypergraph Transformer for Semi-Supervised Classification [50.92027313775934]
We propose a novel hypergraph learning framework, HyperGraph Transformer (HyperGT)
HyperGT uses a Transformer-based neural network architecture to effectively consider global correlations among all nodes and hyperedges.
It achieves comprehensive hypergraph representation learning by effectively incorporating global interactions while preserving local connectivity patterns.
arXiv Detail & Related papers (2023-12-18T17:50:52Z) - HyperDID: Hyperspectral Intrinsic Image Decomposition with Deep Feature
Embedding [9.32185717565188]
This study rethinks hyperspectral intrinsic image decomposition for classification tasks by introducing deep feature embedding.
The proposed framework, HyperDID, incorporates the Environmental Feature Module (EFM) and Categorical Feature Module (CFM) to extract intrinsic features.
Experimental results across three commonly used datasets validate the effectiveness of HyperDID in improving hyperspectral image classification performance.
arXiv Detail & Related papers (2023-11-25T02:05:10Z) - A Survey of Graph and Attention Based Hyperspectral Image Classification
Methods for Remote Sensing Data [5.1901440366375855]
The use of Deep Learning techniques for classification in Hyperspectral Imaging (HSI) is rapidly growing.
Recent methods have also explored the usage of Graph Convolution Networks and their unique ability to use node features in prediction.
arXiv Detail & Related papers (2023-10-16T00:42:25Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.