Related papers: Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training

Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training

URL: http://arxiv.org/abs/2510.15527v1
Date: Fri, 17 Oct 2025 10:59:24 GMT
Title: Balanced Multi-Task Attention for Satellite Image Classification: A Systematic Approach to Achieving 97.23% Accuracy on EuroSAT Without Pre-Training
Authors: Aditya Vir,
Abstract summary: This work presents a systematic investigation of custom convolutional neural network architectures for satellite land use classification.<n>We achieve 97.23% test accuracy on the EuroSAT dataset without reliance on pre-trained models.<n>Our approach achieves performance within 1.34% of fine-tuned ResNet-50 (98.57%) while requiring no external data.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work presents a systematic investigation of custom convolutional neural network architectures for satellite land use classification, achieving 97.23% test accuracy on the EuroSAT dataset without reliance on pre-trained models. Through three progressive architectural iterations (baseline: 94.30%, CBAM-enhanced: 95.98%, and balanced multi-task attention: 97.23%) we identify and address specific failure modes in satellite imagery classification. Our principal contribution is a novel balanced multi-task attention mechanism that combines Coordinate Attention for spatial feature extraction with Squeeze-Excitation blocks for spectral feature extraction, unified through a learnable fusion parameter. Experimental results demonstrate that this learnable parameter autonomously converges to alpha approximately 0.57, indicating near-equal importance of spatial and spectral modalities for satellite imagery. We employ progressive DropBlock regularization (5-20% by network depth) and class-balanced loss weighting to address overfitting and confusion pattern imbalance. The final 12-layer architecture achieves Cohen's Kappa of 0.9692 with all classes exceeding 94.46% accuracy, demonstrating confidence calibration with a 24.25% gap between correct and incorrect predictions. Our approach achieves performance within 1.34% of fine-tuned ResNet-50 (98.57%) while requiring no external data, validating the efficacy of systematic architectural design for domain-specific applications. Complete code, trained models, and evaluation scripts are publicly available.

Related papers

Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z)
A Multimodal Approach to Heritage Preservation in the Context of Climate Change [0.0]
We propose a lightweight multimodal architecture that fuses sensor data (temperature, humidity) with visual imagery to predict severity at heritage sites.<n>On data from Strasbourg Cathedral, our model achieves 76.9% accu- racy, a 43% improvement over standard multimodal architectures.
arXiv Detail & Related papers (2025-10-15T22:07:57Z)
CLAIRE: A Dual Encoder Network with RIFT Loss and Phi-3 Small Language Model Based Interpretability for Cross-Modality Synthetic Aperture Radar and Optical Land Cover Segmentation [1.1237223647481338]
We propose a dual encoder architecture that independently extracts modality-specific features from optical and Synthetic Aperture Radar (SAR) imagery.<n>This fusion mechanism highlights complementary spatial and textural features, enabling the network to better capture detailed and diverse land cover patterns.<n>We also introduce a metric-driven reasoning module generated by a Small Language Model (Phi-3), which generates expert-level, sample-specific justifications for model predictions.
arXiv Detail & Related papers (2025-09-15T14:10:52Z)
Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithms [3.9167717582896793]
Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing and tree segmentation.<n>This study addresses these gaps by conducting a benchmark of machine learning and deep learning methods for tree species classification.
arXiv Detail & Related papers (2025-04-19T16:03:49Z)
Classification of Geographical Land Structure Using Convolution Neural Network and Transfer Learning [1.024113475677323]
This study can produce a set of applications such as urban planning and development, environmental monitoring, disaster management, etc. This article developed a deep learning-based approach to automate the process of classifying geographical land structures.
arXiv Detail & Related papers (2024-11-19T11:01:30Z)
Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning [50.868594148443215]
In computer vision, traditional ensemble learning methods exhibit either a low training efficiency or the limited performance. We propose a lightweight, loss-function-free, and architecture-agnostic ensemble learning by the Decorrelating Structure via Adapters (DSA) for various visual tasks.
arXiv Detail & Related papers (2024-08-08T01:31:38Z)
Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods. We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z)
Touch Analysis: An Empirical Evaluation of Machine Learning Classification Algorithms on Touch Data [7.018254711671888]
We propose a novel Deep Neural Net (DNN) architecture to classify the individuals correctly. When we combine the new features with the existing ones, SVM and kNN achieved the classification accuracy of 94.7% and 94.6%, respectively.
arXiv Detail & Related papers (2023-11-23T20:31:48Z)
Whole-body Detection, Recognition and Identification at Altitude and Range [57.445372305202405]
We propose an end-to-end system evaluated on diverse datasets. Our approach involves pre-training the detector on common image datasets and fine-tuning it on BRIAR's complex videos and images. We conduct thorough evaluations under various conditions, such as different ranges and angles in indoor, outdoor, and aerial scenarios.
arXiv Detail & Related papers (2023-11-09T20:20:23Z)
Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning [79.43940012723539]
ADCLR is a self-supervised learning framework for learning accurate and dense vision representation. Our approach achieves new state-of-the-art performance for contrastive methods.
arXiv Detail & Related papers (2023-06-23T07:38:09Z)
Tactile Grasp Refinement using Deep Reinforcement Learning and Analytic Grasp Stability Metrics [70.65363356763598]
We show that analytic grasp stability metrics constitute powerful optimization objectives for reinforcement learning algorithms. We show that a combination of geometric and force-agnostic grasp stability metrics yields the highest average success rates of 95.4% for cuboids. In a second experiment, we show that grasp refinement algorithms trained with contact feedback perform up to 6.6% better than a baseline that receives no tactile information.
arXiv Detail & Related papers (2021-09-23T09:20:19Z)
Semi-Supervised Neural Architecture Search [185.0651567642238]
SemiNAS is a semi-supervised Neural architecture search (NAS) approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost) It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures. It achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.
arXiv Detail & Related papers (2020-02-24T17:23:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.