Related papers: YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection

YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection

URL: http://arxiv.org/abs/2506.21135v1
Date: Thu, 26 Jun 2025 10:32:37 GMT
Title: YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection
Authors: Jiawei Hu,
Abstract summary: YOLO-FDA is a novel YOLO-based detection framework that integrates fine-grained detail enhancement and attention-guided feature fusion.<n>We show that YOLO-FDA consistently outperforms existing state-of-the-art methods in terms of both accuracy and robustness across diverse types of defects and scales.
Score: 0.32634122554914
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Surface defect detection in industrial scenarios is both crucial and technically demanding due to the wide variability in defect types, irregular shapes and sizes, fine-grained requirements, and complex material textures. Although recent advances in AI-based detectors have improved performance, existing methods often suffer from redundant features, limited detail sensitivity, and weak robustness under multiscale conditions. To address these challenges, we propose YOLO-FDA, a novel YOLO-based detection framework that integrates fine-grained detail enhancement and attention-guided feature fusion. Specifically, we adopt a BiFPN-style architecture to strengthen bidirectional multilevel feature aggregation within the YOLOv5 backbone. To better capture fine structural changes, we introduce a Detail-directional Fusion Module (DDFM) that introduces a directional asymmetric convolution in the second-lowest layer to enrich spatial details and fuses the second-lowest layer with low-level features to enhance semantic consistency. Furthermore, we propose two novel attention-based fusion strategies, Attention-weighted Concatenation (AC) and Cross-layer Attention Fusion (CAF) to improve contextual representation and reduce feature noise. Extensive experiments on benchmark datasets demonstrate that YOLO-FDA consistently outperforms existing state-of-the-art methods in terms of both accuracy and robustness across diverse types of defects and scales.

Related papers

FGAA-FPN: Foreground-Guided Angle-Aware Feature Pyramid Network for Oriented Object Detection [1.0152838128195467]
We propose a Foreground-Guided Angle-Aware Feature Pyramid Network for oriented object detection.<n> FGAA-FPN is built on a hierarchical functional decomposition that accounts for the distinct spatial resolution and semantic abstraction across pyramid levels.<n>Experiments on DOTA v1.0 and DOTA v1.5 demonstrate that FGAA-FPN state-of-the-art results, reaching 75.5% and 68.3% mAP, respectively.
arXiv Detail & Related papers (2026-02-11T10:15:06Z)
YOLO-DS: Fine-Grained Feature Decoupling via Dual-Statistic Synergy Operator for Object Detection [55.58092342624062]
We propose YOLO-DS, a framework built around a novel Dual-Statistic Synergy Operator (DSO)<n>YOLO-DS decouples object features by jointly modeling the channel-wise mean and the peak-to-mean difference.<n>On the MS-COCO benchmark, YOLO-DS consistently outperforms YOLOv8 across five model scales.
arXiv Detail & Related papers (2026-01-26T05:50:32Z)
ShortcutBreaker: Low-Rank Noisy Bottleneck with Global Perturbation Attention for Multi-Class Unsupervised Anomaly Detection [59.89803740308262]
ShortcutBreaker is a novel unified feature-reconstruction framework for MUAD tasks.<n>It features two key innovations to address the issue of shortcuts.<n>The proposed method achieves a remarkable image-level AUROC of 99.8%, 98.9%, 90.6%, and 87.8% on four datasets.
arXiv Detail & Related papers (2025-10-21T06:51:30Z)
Graph-Based Uncertainty Modeling and Multimodal Fusion for Salient Object Detection [12.743278093269325]
We propose a dynamic uncertainty propagation and multimodal collaborative reasoning network (DUP-MCRNet)<n>DUGC is designed to propagate uncertainty between layers through a sparse graph constructed based on spatial semantic distance.<n>MCF uses learnable modality gating weights to weightedly fuse the attention maps of RGB, depth, and edge features.
arXiv Detail & Related papers (2025-08-28T04:31:48Z)
YOLO-RS: Remote Sensing Enhanced Crop Detection Methods [0.32985979395737786]
Existing target detection methods show poor performance when dealing with small targets in remote sensing images.<n>YOLO-RS is based on the latest Yolov11 which significantly enhances the detection of small targets.<n>Experiments validate the effectiveness and application potential of YOLO-RS in the task of detecting small targets in remote sensing images.
arXiv Detail & Related papers (2025-04-15T13:13:22Z)
X2-DFD: A framework for eXplainable and eXtendable Deepfake Detection [55.77552681618732]
X2-DFD is an eXplainable and eXtendable framework based on multimodal large-language models (MLLMs) for deepfake detection.<n>The first stage, Model Feature Assessment, systematically evaluates the detectability of forgery-related features for the MLLM.<n>The second stage, Explainable dataset Construction, consists of two key modules: Strong Feature Strengthening and Weak Feature Supplementing.<n>The third stage, Fine-tuning and Inference, involves fine-tuning the MLLM on the constructed dataset and deploying it for final detection and explanation.
arXiv Detail & Related papers (2024-10-08T15:28:33Z)
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection [17.406051477690134]
Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems. We propose a novel hierarchical feature refinement network for event-frame fusion. Our method exhibits significantly better robustness when introducing 15 different corruption types to the frame images.
arXiv Detail & Related papers (2024-07-17T14:09:46Z)
SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection [29.348921424716057]
This paper presents a novel heterogeneous feature fusion block, comprising a holistic attention module, a heterogeneous feature contrast descriptor, and an affinity-weighted feature recalibrator. It incorporates both inter-scale and intra-scale skip connections into the decoder architecture while eliminating redundant ones, leading to both improved accuracy and computational efficiency. It introduces two fallibility-aware loss functions that separately focus on semantic-transition and depth-inconsistent regions, collectively contributing to greater supervision during model training.
arXiv Detail & Related papers (2024-02-29T07:20:02Z)
SaRPFF: A Self-Attention with Register-based Pyramid Feature Fusion module for enhanced RLD detection [0.3262230127283452]
SaRPFF (Self-Attention with Register-based Pyramid Feature Fusion) is a novel module designed to enhance multi-scale object detection.<n>It integrates 2D-Multi-Head Self-Attention (MHSA) with Register tokens, improving feature interpretability.<n>Our approach demonstrates a +2.61% improvement in Average Precision (AP) on the MRLD dataset compared to the baseline FPN method in YOLOv7.
arXiv Detail & Related papers (2024-02-26T04:18:42Z)
Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps. We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores. Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z)
Progressive Multi-scale Consistent Network for Multi-class Fundus Lesion Segmentation [28.58972084293778]
We propose a progressive multi-scale consistent network (PMCNet) that integrates the proposed progressive feature fusion (PFF) block and dynamic attention block (DAB) PFF block progressively integrates multi-scale features from adjacent encoding layers, facilitating feature learning of each layer by aggregating fine-grained details and high-level semantics. DAB is designed to dynamically learn the attentive cues from the fused features at different scales, thus aiming to smooth the essential conflicts existing in multi-scale features.
arXiv Detail & Related papers (2022-05-31T12:10:01Z)
ASFD: Automatic and Scalable Face Detector [129.82350993748258]
We propose a novel Automatic and Scalable Face Detector (ASFD) ASFD is based on a combination of neural architecture search techniques as well as a new loss design. Our ASFD-D6 outperforms the prior strong competitors, and our lightweight ASFD-D0 runs at more than 120 FPS with Mobilenet for VGA-resolution images.
arXiv Detail & Related papers (2020-03-25T06:00:47Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.