FOAM: A General Frequency-Optimized Anti-Overlapping Framework for Overlapping Object Perception
- URL: http://arxiv.org/abs/2506.13501v1
- Date: Mon, 16 Jun 2025 13:58:49 GMT
- Title: FOAM: A General Frequency-Optimized Anti-Overlapping Framework for Overlapping Object Perception
- Authors: Mingyuan Li, Tong Jia, Han Gu, Hui Lu, Hao Wang, Bowen Ma, Shuyang Lin, Shiyi Guo, Shizhuo Deng, Dongyue Chen,
- Abstract summary: Overlapping object perception aims to decouple the randomly overlapping foreground-background features, extracting foreground features while suppressing background features, which holds significant application value in fields such as security screening and medical auxiliary diagnosis.<n>We propose a general Frequency-based Anti-Overlapping Framework (FOAM) to assist the model in extracting more texture and contour information.<n>Specifically, we design the Frequency Spatial Transformer Block (FSTB), which can simultaneously extract features from both the frequency and spatial domains, helping the network capture more texture features from the foreground.
- Score: 7.258172311752019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overlapping object perception aims to decouple the randomly overlapping foreground-background features, extracting foreground features while suppressing background features, which holds significant application value in fields such as security screening and medical auxiliary diagnosis. Despite some research efforts to tackle the challenge of overlapping object perception, most solutions are confined to the spatial domain. Through frequency domain analysis, we observe that the degradation of contours and textures due to the overlapping phenomenon can be intuitively reflected in the magnitude spectrum. Based on this observation, we propose a general Frequency-Optimized Anti-Overlapping Framework (FOAM) to assist the model in extracting more texture and contour information, thereby enhancing the ability for anti-overlapping object perception. Specifically, we design the Frequency Spatial Transformer Block (FSTB), which can simultaneously extract features from both the frequency and spatial domains, helping the network capture more texture features from the foreground. In addition, we introduce the Hierarchical De-Corrupting (HDC) mechanism, which aligns adjacent features in the separately constructed base branch and corruption branch using a specially designed consistent loss during the training phase. This mechanism suppresses the response to irrelevant background features of FSTBs, thereby improving the perception of foreground contour. We conduct extensive experiments to validate the effectiveness and generalization of the proposed FOAM, which further improves the accuracy of state-of-the-art models on four datasets, specifically for the three overlapping object perception tasks: Prohibited Item Detection, Prohibited Item Segmentation, and Pneumonia Detection. The code will be open source once the paper is accepted.
Related papers
- High-Frequency Semantics and Geometric Priors for End-to-End Detection Transformers in Challenging UAV Imagery [4.833513511627847]
Unmanned Aerial Vehicle-based Object Detection (UAV-OD) faces substantial challenges, including small target sizes, high-density distributions, and cluttered backgrounds in UAV imagery.<n>We propose HEGS-DETR, a comprehensively enhanced, real-time Detection Transformer framework tailored for UAVs.<n> Experiments on the VisDrone dataset demonstrate that HEGS-DETR achieves a 5.1% AP50 and 3.8% AP increase over the baseline, while maintaining real-time speed and reducing parameter count by 4M.
arXiv Detail & Related papers (2025-07-01T14:56:56Z) - Revisiting Deep Feature Reconstruction for Logical and Structural Industrial Anomaly Detection [2.3020018305241337]
Industrial anomaly detection is crucial for quality control and predictive maintenance.
Existing methods commonly detect structural anomalies, such as dents and scratches, by leveraging multi-scale features from image patches extracted through deep pre-trained networks.
We address these limitations by focusing on Deep Feature Reconstruction (DFR), a memory- and compute-efficient approach for detecting structural anomalies.
We further enhance DFR into a unified framework, called ULSAD, which is capable of detecting both structural and logical anomalies.
arXiv Detail & Related papers (2024-10-21T17:56:47Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - Edge Based Oriented Object Detection [8.075609633483248]
We propose a unique loss function based on edge gradients to enhance the detection accuracy of oriented objects.
We achieve a mAP increase of 1.3% on the DOTA dataset.
arXiv Detail & Related papers (2023-09-15T09:19:38Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.<n>Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.<n>Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Boundary-semantic collaborative guidance network with dual-stream
feedback mechanism for salient object detection in optical remote sensing
imagery [22.21644705244091]
We propose boundary-semantic collaborative guidance network (BSCGNet) with dual-stream feedback mechanism.
BSCGNet exhibits distinct advantages in challenging scenarios and outperforms the 17 state-of-the-art (SOTA) approaches proposed in recent years.
arXiv Detail & Related papers (2023-03-06T03:36:06Z) - A Tri-Layer Plugin to Improve Occluded Detection [100.99802831241583]
We propose a simple '' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.
The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object.
We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects.
arXiv Detail & Related papers (2022-10-18T17:59:51Z) - Self-Supervised Masked Convolutional Transformer Block for Anomaly
Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level.
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - High-resolution Iterative Feedback Network for Camouflaged Object
Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms.
We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries.
We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.