Related papers: CEM-FBGTinyDet: Context-Enhanced Foreground Balance with Gradient Tuning for tiny Objects

CEM-FBGTinyDet: Context-Enhanced Foreground Balance with Gradient Tuning for tiny Objects

URL: http://arxiv.org/abs/2506.09897v1
Date: Wed, 11 Jun 2025 16:13:38 GMT
Title: CEM-FBGTinyDet: Context-Enhanced Foreground Balance with Gradient Tuning for tiny Objects
Authors: Tao Liu, Zhenchao Cui,
Abstract summary: We propose E-FPN-BS, a novel architecture integrating multi-scale feature enhancement and adaptive optimization.<n>First, our Context Enhancement Module(CEM) employs dual-branch processing to align and compress high-level features for effective global-local fusion.<n>Second, the Foreground-Background Separation Module (FBSM) generates spatial gating masks that dynamically amplify discriminative regions.
Score: 2.321156185872456
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Tiny object detection (TOD) reveals a fundamental flaw in feature pyramid networks: high-level features (P5-P6) frequently receive zero positive anchors under standard label assignment protocols, leaving their semantic representations untrained due to exclusion from loss computation. This creates dual deficiencies: (1) Stranded high-level features become semantic dead-ends without gradient updates, while (2) low-level features lack essential semantic context for robust classification. We propose E-FPN-BS that systematically converts wasted high-level semantics into low-level feature enhancements. To address these issues, we propose E-FPN-BS, a novel architecture integrating multi-scale feature enhancement and adaptive optimization. First, our Context Enhancement Module(CEM) employs dual-branch processing to align and compress high-level features for effective global-local fusion. Second, the Foreground-Background Separation Module (FBSM) generates spatial gating masks that dynamically amplify discriminative regions. To address gradient imbalance across object scales, we further propose a Dynamic Gradient-Balanced Loss (DCLoss) that automatically modulates loss contributions via scale-aware gradient equilibrium. Extensive experiments across multiple benchmark datasets demonstrate the outstanding performance and generalization ability of our approach.

Related papers

YOLO-FDA: Integrating Hierarchical Attention and Detail Enhancement for Surface Defect Detection [0.32634122554914]
YOLO-FDA is a novel YOLO-based detection framework that integrates fine-grained detail enhancement and attention-guided feature fusion.<n>We show that YOLO-FDA consistently outperforms existing state-of-the-art methods in terms of both accuracy and robustness across diverse types of defects and scales.
arXiv Detail & Related papers (2025-06-26T10:32:37Z)
SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling [16.742839354514512]
We introduce SkipGPT, a dynamic layer pruning framework to optimize large language models.<n>We show that SkipGPT reduces over 40% of model parameters while matching or exceeding the performance of the original dense model.
arXiv Detail & Related papers (2025-06-04T17:26:31Z)
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation [12.838593066237452]
Large language models (LLMs) memorize frequently sensitive information during training, posing risks when deploying publicly accessible models.<n>This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which combines causal mediation analysis with layer-specific optimization.
arXiv Detail & Related papers (2025-04-17T15:05:40Z)
CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection [7.262250906929891]
Cross-layer feature pyramid networks (CFPNs) have achieved notable progress in multi-scale feature fusion and boundary detail preservation for salient object detection.<n>To address these challenges, we propose CFMD, a novel cross-layer feature pyramid network that introduces two key innovations.<n>First, we design a context-aware feature aggregation module (CFLMA), which incorporates the state-of-the-art Mamba architecture to construct a dynamic weight distribution mechanism.<n>Second, we introduce an adaptive dynamic upsampling unit (CFLMD) that preserves spatial details during resolution recovery.
arXiv Detail & Related papers (2025-04-02T03:22:36Z)
UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders. We first develop an adaptive feature mask generator to account for the unique significance of nodes. We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z)
High-level Feature Guided Decoding for Semantic Segmentation [54.424062794490254]
We propose to use powerful pre-trained high-level features as guidance (HFG) for the upsampler to produce robust results. Specifically, the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification. To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature.
arXiv Detail & Related papers (2023-03-15T14:23:07Z)
Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z)
Disentangled Federated Learning for Tackling Attributes Skew via Invariant Aggregation and Diversity Transferring [104.19414150171472]
Attributes skews the current federated learning (FL) frameworks from consistent optimization directions among the clients. We propose disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches. Experiments verify that DFL facilitates FL with higher performance, better interpretability, and faster convergence rate, compared with SOTA FL methods.
arXiv Detail & Related papers (2022-06-14T13:12:12Z)
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning. A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z)
Single-Shot Two-Pronged Detector with Rectified IoU Loss [10.616828072065093]
We introduce a novel two-pronged transductive idea to explore the relationship among different layers in both backward and forward directions. Under the guidance of the two-pronged idea, we propose a Two-Pronged Network (TPNet) to achieve bidirectional transfer between high-level features and low-level features.
arXiv Detail & Related papers (2020-08-08T12:36:55Z)
Hierarchical Bi-Directional Feature Perception Network for Person Re-Identification [12.259747100939078]
Previous Person Re-Identification (Re-ID) models aim to focus on the most discriminative region of an image. We propose a novel model named Hierarchical Bi-directional Feature Perception Network (HBFP-Net) to correlate multi-level information and reinforce each other. Experiments implemented on the mainstream evaluation including Market-1501, CUHK03 and DukeMTMC-ReID datasets show that our method outperforms the recent SOTA Re-ID models.
arXiv Detail & Related papers (2020-08-08T12:33:32Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.