Related papers: Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation

Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation

URL: http://arxiv.org/abs/2510.05266v1
Date: Mon, 06 Oct 2025 18:33:31 GMT
Title: Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation
Authors: Christina Thrainer, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Christian Guetl, Steven Sloan, Kendall N. Niles, Ken Pathak,
Abstract summary: Few-shot semantic segmentation is vital for deep learning-based infrastructure inspection applications.<n>We present our Enhanced Feature Pyramid Network (E-FPN) framework for few-shot semantic segmentation of culvert and sewer defect categories.<n>Our framework addresses the critical need to rapidly respond to new defect types in infrastructure inspection systems with limited new training data.
Score: 1.440641607825089
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Few-shot semantic segmentation is vital for deep learning-based infrastructure inspection applications, where labeled training examples are scarce and expensive. Although existing deep learning frameworks perform well, the need for extensive labeled datasets and the inability to learn new defect categories with little data are problematic. We present our Enhanced Feature Pyramid Network (E-FPN) framework for few-shot semantic segmentation of culvert and sewer defect categories using a prototypical learning framework. Our approach has three main contributions: (1) adaptive E-FPN encoder using InceptionSepConv blocks and depth-wise separable convolutions for efficient multi-scale feature extraction; (2) prototypical learning with masked average pooling for powerful prototype generation from small support examples; and (3) attention-based feature representation through global self-attention, local self-attention and cross-attention. Comprehensive experimentation on challenging infrastructure inspection datasets illustrates that the method achieves excellent few-shot performance, with the best configuration being 8-way 5-shot training configuration at 82.55% F1-score and 72.26% mIoU in 2-way classification testing. The self-attention method had the most significant performance improvements, providing 2.57% F1-score and 2.9% mIoU gain over baselines. Our framework addresses the critical need to rapidly respond to new defect types in infrastructure inspection systems with limited new training data that lead to more efficient and economical maintenance plans for critical infrastructure systems.

Related papers

AI-Based Culvert-Sewer Inspection [0.0]
Culverts and sewer pipes are critical components of drainage systems, and their failure can lead to serious risks to public safety and the environment.<n>This thesis proposes three methods to significantly enhance defect segmentation and handle data scarcity.<n>ForTRESS is a novel architecture that combines depthwise separable convolutions, adaptive Kolmogorov-Arnold Networks (KAN), and multi-scale attention mechanisms.
arXiv Detail & Related papers (2026-01-21T16:33:33Z)
Rethinking Pulmonary Embolism Segmentation: A Study of Current Approaches and Challenges with an Open Weight Model [21.024556007374684]
3D models are particularly well-suited to this task given the morphological characteristics of emboli.<n>CNN-based models generally yield superior performance compared to their ViT-based counterparts in PE segmentation.<n>Central and large emboli can be segmented with satisfactory accuracy, while distal emboli remain challenging due to both task complexity and the scarcity of high-quality datasets.
arXiv Detail & Related papers (2025-09-22T18:34:30Z)
Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z)
Efficient Building Roof Type Classification: A Domain-Specific Self-Supervised Approach [2.3020018305241337]
This paper investigates the effectiveness of self supervised learning with EfficientNet architectures for building roof type classification.<n>We propose a novel framework that incorporates a Convolutional Block Attention Module (CBAM) to enhance the feature extraction capabilities of EfficientNet.
arXiv Detail & Related papers (2025-03-28T09:04:11Z)
Crack Detection in Infrastructure Using Transfer Learning, Spatial Attention, and Genetic Algorithm Optimization [3.1687473999848836]
Crack detection plays a pivotal role in the maintenance and safety of infrastructure, including roads, bridges, and buildings. Traditionally, manual inspection has been the norm, but it is labor-intensive, subjective, and hazardous. This paper introduces an advanced approach for crack detection in infrastructure using deep learning, leveraging transfer learning, spatial attention mechanisms, and genetic algorithm(GA) optimization.
arXiv Detail & Related papers (2024-11-26T06:12:56Z)
Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features. This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks. The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z)
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data [100.33096338195723]
We focus on Few-shot Learning with Auxiliary Data (FLAD) FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
arXiv Detail & Related papers (2023-02-01T18:59:36Z)
Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection [8.492340530784697]
We show that finetune-free iFSD can be highly effective when a large number of base categories with abundant data are available for meta-training. We benchmark our model on both COCO and LVIS, reporting as high as $17%$ AP on the long-tail rare classes on LVIS.
arXiv Detail & Related papers (2022-03-25T20:39:00Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module. We also propose novel training strategies that effectively improve detection performance. Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.