Related papers: DyCAF-Net: Dynamic Class-Aware Fusion Network

DyCAF-Net: Dynamic Class-Aware Fusion Network

URL: http://arxiv.org/abs/2508.03598v1
Date: Tue, 05 Aug 2025 16:06:26 GMT
Title: DyCAF-Net: Dynamic Class-Aware Fusion Network
Authors: Md Abrar Jahin, Shahriar Soudeep, M. F. Mridha, Nafiz Fahad, Md. Jakir Hossen,
Abstract summary: We introduce Dynamic Class-Aware Fusion Network (DyCAF-Net)<n>DyCAF-Net achieves significant improvements in precision, mAP@50, and mAP@50-95 across 13 diverse benchmarks.<n>Its adaptability to scale variance, semantic overlaps, and class imbalance positions it as a robust solution for real-world detection tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in object detection rely on modular architectures with multi-scale fusion and attention mechanisms. However, static fusion heuristics and class-agnostic attention limit performance in dynamic scenes with occlusions, clutter, and class imbalance. We introduce Dynamic Class-Aware Fusion Network (DyCAF-Net) that addresses these challenges through three innovations: (1) an input-conditioned equilibrium-based neck that iteratively refines multi-scale features via implicit fixed-point modeling, (2) a dual dynamic attention mechanism that adaptively recalibrates channel and spatial responses using input- and class-dependent cues, and (3) class-aware feature adaptation that modulates features to prioritize discriminative regions for rare classes. Through comprehensive ablation studies with YOLOv8 and related architectures, alongside benchmarking against nine state-of-the-art baselines, DyCAF-Net achieves significant improvements in precision, mAP@50, and mAP@50-95 across 13 diverse benchmarks, including occlusion-heavy and long-tailed datasets. The framework maintains computational efficiency ($\sim$11.1M parameters) and competitive inference speeds, while its adaptability to scale variance, semantic overlaps, and class imbalance positions it as a robust solution for real-world detection tasks in medical imaging, surveillance, and autonomous systems.

Related papers

Quantum-Informed Contrastive Learning with Dynamic Mixup Augmentation for Class-Imbalanced Expert Systems [0.0]
QCL-MixNet is a novel framework for dynamic mixup for robust classification under imbalance.<n>We show that QCL-MixNet consistently outperforms 20 state-of-the-art machine learning, deep learning, and GNN-based baselines in macro-F1 and recall.
arXiv Detail & Related papers (2025-06-16T20:44:30Z)
AFD-STA: Adaptive Filtering Denoising with Spatiotemporal Attention for Chaotic System Prediction [4.833734041528231]
AFD-STA Net presents a framework for predicting high-dimensional chaotic systems governed by partial differential equations.<n>The framework shows promising potential for realworld applications requiring simultaneous handling of measurement uncertainties and high-dimensional nonlinear dynamics.
arXiv Detail & Related papers (2025-05-23T16:39:07Z)
Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking [8.040709469401257]
We propose a differentiable dynamic attention mechanism that adaptively channel adjusts attention weights by analyzing spatial attention weights.<n>A lightweight gating network that autonomously allocates computational resources based on target motion states, prioritizes high-discriminability features in challenging scenarios.
arXiv Detail & Related papers (2025-03-21T00:48:31Z)
Static-Dynamic Class-level Perception Consistency in Video Semantic Segmentation [9.964615076037397]
Video semantic segmentation (VSS) has been widely employed in lots of fields, such as simultaneous localization and mapping.<n>Previous efforts have primarily focused on pixel-level static-dynamic contexts matching.<n>This paper rethinks static-dynamic contexts at the class level and proposes a novel static-dynamic class-level perceptual consistency framework.
arXiv Detail & Related papers (2024-12-11T02:29:51Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [115.79349923044663]
Few-shot class-incremental learning (FSCIL) aims to incrementally learn novel classes from limited examples.<n>Existing methods face a critical dilemma: static architectures rely on a fixed parameter space to learn from data that arrive sequentially, prone to overfitting to the current session.<n>In this study, we explore the potential of Selective State Space Models (SSMs) for FSCIL.
arXiv Detail & Related papers (2024-07-08T17:09:39Z)
Dynamic Feature Learning and Matching for Class-Incremental Learning [20.432575325147894]
Class-incremental learning (CIL) has emerged as a means to learn new classes without catastrophic forgetting of previous classes. We propose the Dynamic Feature Learning and Matching (DFLM) model in this paper. Our proposed model achieves significant performance improvements over existing methods.
arXiv Detail & Related papers (2024-05-14T12:17:19Z)
Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z)
A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers. Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module. Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z)
Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC) SFSC generates a series of compatible sub-models with different capacities through one training process. SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z)
A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness. Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set. Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z)
ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings [54.33327082243022]
ClusterVO is a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving.
arXiv Detail & Related papers (2020-03-29T09:06:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.