Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training
- URL: http://arxiv.org/abs/2510.15527v1
- Date: Fri, 17 Oct 2025 10:59:24 GMT
- Title: Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training
- Authors: Aditya Vir,
- Abstract summary: This work presents a systematic investigation of custom convolutional neural network architectures for satellite land use classification.<n>We achieve 97.23% test accuracy on the EuroSAT dataset without reliance on pre-trained models.<n>Our approach achieves performance within 1.34% of fine-tuned ResNet-50 (98.57%) while requiring no external data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents a systematic investigation of custom convolutional neural network architectures for satellite land use classification, achieving 97.23% test accuracy on the EuroSAT dataset without reliance on pre-trained models. Through three progressive architectural iterations (baseline: 94.30%, CBAM-enhanced: 95.98%, and balanced multi-task attention: 97.23%) we identify and address specific failure modes in satellite imagery classification. Our principal contribution is a novel balanced multi-task attention mechanism that combines Coordinate Attention for spatial feature extraction with Squeeze-Excitation blocks for spectral feature extraction, unified through a learnable fusion parameter. Experimental results demonstrate that this learnable parameter autonomously converges to alpha approximately 0.57, indicating near-equal importance of spatial and spectral modalities for satellite imagery. We employ progressive DropBlock regularization (5-20% by network depth) and class-balanced loss weighting to address overfitting and confusion pattern imbalance. The final 12-layer architecture achieves Cohen's Kappa of 0.9692 with all classes exceeding 94.46% accuracy, demonstrating confidence calibration with a 24.25% gap between correct and incorrect predictions. Our approach achieves performance within 1.34% of fine-tuned ResNet-50 (98.57%) while requiring no external data, validating the efficacy of systematic architectural design for domain-specific applications. Complete code, trained models, and evaluation scripts are publicly available.
Related papers
- Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z) - A Multimodal Approach to Heritage Preservation in the Context of Climate Change [0.0]
We propose a lightweight multimodal architecture that fuses sensor data (temperature, humidity) with visual imagery to predict severity at heritage sites.<n>On data from Strasbourg Cathedral, our model achieves 76.9% accu- racy, a 43% improvement over standard multimodal architectures.
arXiv Detail & Related papers (2025-10-15T22:07:57Z) - CLAIRE: A Dual Encoder Network with RIFT Loss and Phi-3 Small Language Model Based Interpretability for Cross-Modality Synthetic Aperture Radar and Optical Land Cover Segmentation [1.1237223647481338]
We propose a dual encoder architecture that independently extracts modality-specific features from optical and Synthetic Aperture Radar (SAR) imagery.<n>This fusion mechanism highlights complementary spatial and textural features, enabling the network to better capture detailed and diverse land cover patterns.<n>We also introduce a metric-driven reasoning module generated by a Small Language Model (Phi-3), which generates expert-level, sample-specific justifications for model predictions.
arXiv Detail & Related papers (2025-09-15T14:10:52Z) - Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithms [3.9167717582896793]
Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing and tree segmentation.<n>This study addresses these gaps by conducting a benchmark of machine learning and deep learning methods for tree species classification.
arXiv Detail & Related papers (2025-04-19T16:03:49Z) - Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning [1.024113475677323]
This study can produce a set of applications such as urban planning and development, environmental monitoring, disaster management, etc.
This article developed a deep learning-based approach to automate the process of classifying geographical land structures.
arXiv Detail & Related papers (2024-11-19T11:01:30Z) - Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning [50.868594148443215]
In computer vision, traditional ensemble learning methods exhibit either a low training efficiency or the limited performance.
We propose a lightweight, loss-function-free, and architecture-agnostic ensemble learning by the Decorrelating Structure via Adapters (DSA) for various visual tasks.
arXiv Detail & Related papers (2024-08-08T01:31:38Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - Touch Analysis: An Empirical Evaluation of Machine Learning
Classification Algorithms on Touch Data [7.018254711671888]
We propose a novel Deep Neural Net (DNN) architecture to classify the individuals correctly.
When we combine the new features with the existing ones, SVM and kNN achieved the classification accuracy of 94.7% and 94.6%, respectively.
arXiv Detail & Related papers (2023-11-23T20:31:48Z) - Whole-body Detection, Recognition and Identification at Altitude and
Range [57.445372305202405]
We propose an end-to-end system evaluated on diverse datasets.
Our approach involves pre-training the detector on common image datasets and fine-tuning it on BRIAR's complex videos and images.
We conduct thorough evaluations under various conditions, such as different ranges and angles in indoor, outdoor, and aerial scenarios.
arXiv Detail & Related papers (2023-11-09T20:20:23Z) - Patch-Level Contrasting without Patch Correspondence for Accurate and
Dense Contrastive Representation Learning [79.43940012723539]
ADCLR is a self-supervised learning framework for learning accurate and dense vision representation.
Our approach achieves new state-of-the-art performance for contrastive methods.
arXiv Detail & Related papers (2023-06-23T07:38:09Z) - Tactile Grasp Refinement using Deep Reinforcement Learning and Analytic
Grasp Stability Metrics [70.65363356763598]
We show that analytic grasp stability metrics constitute powerful optimization objectives for reinforcement learning algorithms.
We show that a combination of geometric and force-agnostic grasp stability metrics yields the highest average success rates of 95.4% for cuboids.
In a second experiment, we show that grasp refinement algorithms trained with contact feedback perform up to 6.6% better than a baseline that receives no tactile information.
arXiv Detail & Related papers (2021-09-23T09:20:19Z) - Semi-Supervised Neural Architecture Search [185.0651567642238]
SemiNAS is a semi-supervised Neural architecture search (NAS) approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost)
It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures.
It achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.
arXiv Detail & Related papers (2020-02-24T17:23:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.