Revisiting Aerial Scene Classification on the AID Benchmark
- URL: http://arxiv.org/abs/2601.18263v1
- Date: Mon, 26 Jan 2026 08:39:02 GMT
- Title: Revisiting Aerial Scene Classification on the AID Benchmark
- Authors: Subhajeet Das, Susmita Ghosh, Abhiroop Chatterjee,
- Abstract summary: We conduct a literature review of various machine learning methods for aerial image classification.<n>Our survey covers a range of approaches from handcrafted features to traditional CNNs.<n>We have also designed Aerial-Y-Net, a spatial attention-enhanced CNN with multi-scale feature fusion mechanism.
- Score: 1.529342790344802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aerial images play a vital role in urban planning and environmental preservation, as they consist of various structures, representing different types of buildings, forests, mountains, and unoccupied lands. Due to its heterogeneous nature, developing robust models for scene classification remains a challenge. In this study, we conduct a literature review of various machine learning methods for aerial image classification. Our survey covers a range of approaches from handcrafted features (e.g., SIFT, LBP) to traditional CNNs (e.g., VGG, GoogLeNet), and advanced deep hybrid networks. In this connection, we have also designed Aerial-Y-Net, a spatial attention-enhanced CNN with multi-scale feature fusion mechanism, which acts as an attention-based model and helps us to better understand the complexities of aerial images. Evaluated on the AID dataset, our model achieves 91.72% accuracy, outperforming several baseline architectures.
Related papers
- SASP: Strip-Aware Spatial Perception for Fine-Grained Bird Image Classification [5.420786129061269]
This paper proposes a fine-grained bird image classification framework based on strip-aware spatial perception.<n>The proposed method incorporates two novel modules: extensional perception aggregator (EPA) and channel semantic weaving (CSW)<n>Built upon a ResNet-50 backbone, the model enables jump-wise connection of extended structural features across the spatial domain.
arXiv Detail & Related papers (2025-05-30T09:10:12Z) - Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation [49.13393683126712]
Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities.<n> accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes.<n>We propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images.
arXiv Detail & Related papers (2025-05-21T03:57:10Z) - AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations [51.44608822712786]
Visual grounding aims to localize target objects in an image based on natural language descriptions.<n>AerialVG poses new challenges, emphe.g., appearance-based grounding is insufficient to distinguish among multiple visually similar objects.<n>We introduce the first AerialVG dataset, consisting of 5K real-world aerial images, 50K manually annotated descriptions, and 103K objects.
arXiv Detail & Related papers (2025-04-10T15:13:00Z) - Hierarchical Information Flow for Generalized Efficient Image Restoration [108.83750852785582]
We propose a hierarchical information flow mechanism for image restoration, dubbed Hi-IR.<n>Hi-IR constructs a hierarchical information tree representing the degraded image across three levels.<n>In seven common image restoration tasks, Hi-IR achieves its effectiveness and generalizability.
arXiv Detail & Related papers (2024-11-27T18:30:08Z) - EcoCropsAID: Economic Crops Aerial Image Dataset for Land Use Classification [0.0]
The EcoCropsAID dataset is a comprehensive collection of 5,400 aerial images captured between 2014 and 2018 using the Google Earth application.
This dataset focuses on five key economic crops in Thailand: rice, sugarcane, cassava, rubber, and longan.
arXiv Detail & Related papers (2024-11-05T03:14:36Z) - UW-SDF: Exploiting Hybrid Geometric Priors for Neural SDF Reconstruction from Underwater Multi-view Monocular Images [63.32490897641344]
We propose a framework for reconstructing target objects from multi-view underwater images based on neural SDF.
We introduce hybrid geometric priors to optimize the reconstruction process, markedly enhancing the quality and efficiency of neural SDF reconstruction.
arXiv Detail & Related papers (2024-10-10T16:33:56Z) - Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment [78.21609845377644]
We propose a novel class of state-of-the-art (SOTA) generative model, which exhibits the capability to model intricate relationships.<n>We devise a new diffusion restoration network that leverages the produced enhanced image and noise-containing images.<n>Two visual evaluation branches are designed to comprehensively analyze the obtained high-level feature information.
arXiv Detail & Related papers (2024-02-22T09:39:46Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Towards Geospatial Foundation Models via Continual Pretraining [22.825065739563296]
We propose a novel paradigm for building highly effective foundation models with minimal resource cost and carbon impact.
We first construct a compact yet diverse dataset from multiple sources to promote feature diversity, which we term GeoPile.
Then, we investigate the potential of continual pretraining from large-scale ImageNet-22k models and propose a multi-objective continual pretraining paradigm.
arXiv Detail & Related papers (2023-02-09T07:39:02Z) - Sci-Net: a Scale Invariant Model for Building Detection from Aerial
Images [0.0]
We propose a Scale-invariant neural network (Sci-Net) that is able to segment buildings present in aerial images at different spatial resolutions.
Specifically, we modified the U-Net architecture and fused it with dense Atrous Spatial Pyramid Pooling (ASPP) to extract fine-grained multi-scale representations.
arXiv Detail & Related papers (2021-11-12T16:45:20Z) - Flood Extent Mapping based on High Resolution Aerial Imagery and DEM: A
Hidden Markov Tree Approach [10.72081512622396]
This paper evaluates the proposed geographical hidden Markov tree model through case studies on high-resolution aerial imagery.
Three scenes are selected in heavily vegetated floodplains near the cities of Grimesland and Kinston in North Carolina during Hurricane Matthew floods in 2016.
Results show that the proposed hidden Markov tree model outperforms several state of the art machine learning algorithms.
arXiv Detail & Related papers (2020-08-25T18:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.