BAFPN: Bi directional alignment of features to improve localization accuracy
- URL: http://arxiv.org/abs/2412.01859v1
- Date: Sun, 01 Dec 2024 04:44:06 GMT
- Title: BAFPN: Bi directional alignment of features to improve localization accuracy
- Authors: Li Jiakun, Wang Qingqing, Dong Hongbin, Li Kexin,
- Abstract summary: Current vision models often utilize feature pyramids to extract multi-scale information.
The Feature Pyramid Network (FPN) is one of the most widely used classic architectures.
Traditional FPNs fail to fully address spatial misalignment on a global scale, leading to suboptimal performance in high-precision localization of objects.
- Score: 0.0
- License:
- Abstract: Current state-of-the-art vision models often utilize feature pyramids to extract multi-scale information, with the Feature Pyramid Network (FPN) being one of the most widely used classic architectures. However, traditional FPNs and their variants (e.g., AUGFPN, PAFPN) fail to fully address spatial misalignment on a global scale, leading to suboptimal performance in high-precision localization of objects. In this paper, we propose a novel Bidirectional Alignment Feature Pyramid Network (BAFPN), which aligns misaligned features globally through a Spatial Feature Alignment Module (SPAM) during the bottom-up information propagation phase. Subsequently, it further mitigates aliasing effects caused by cross-scale feature fusion via a fine-grained Semantic Alignment Module (SEAM) in the top-down phase. On the DOTAv1.5 dataset, BAFPN improves the baseline model's AP75, AP50, and mAP by 1.68%, 1.45%, and 1.34%, respectively. Additionally, BAFPN demonstrates significant performance gains when applied to various other advanced detectors.
Related papers
- Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection [3.7793767915135295]
We propose a new model named MAF-YOLO in this paper.
It is a novel object detection framework with a versatile neck named Multi-Branch Auxiliary FPN (MAFPN)
Taking the nano version of MAF-YOLO for example, it can achieve 42.4% AP on COCO with only 3.76M learnable parameters and 10.51G FLOPs, and approximately outperforms YOLOv8n by about 5.1%.
arXiv Detail & Related papers (2024-07-05T09:35:30Z) - AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation [80.33846577924363]
We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for video framegithub.
It is based on two essential designs. First, we build bidirectional volumes for all pairs of pixels, and use the predicted bilateral flows to retrieve correlations.
Second, we derive multiple groups of fine-grained flow fields from one pair of updated coarse flows for performing backward warping on the input frames separately.
arXiv Detail & Related papers (2023-04-19T16:18:47Z) - Disentangled Federated Learning for Tackling Attributes Skew via
Invariant Aggregation and Diversity Transferring [104.19414150171472]
Attributes skews the current federated learning (FL) frameworks from consistent optimization directions among the clients.
We propose disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches.
Experiments verify that DFL facilitates FL with higher performance, better interpretability, and faster convergence rate, compared with SOTA FL methods.
arXiv Detail & Related papers (2022-06-14T13:12:12Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with
Flow-Guided Attentive Correlation and Recursive Boosting [50.17500790309477]
DeMFI-Net is a joint deblurring and multi-frame framework.
It converts blurry videos of lower-frame-rate to sharp videos at higher-frame-rate.
It achieves state-of-the-art (SOTA) performances for diverse datasets.
arXiv Detail & Related papers (2021-11-19T00:00:15Z) - FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [6.613724825924151]
We propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled features.
We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN)
In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within Mask-Former.
arXiv Detail & Related papers (2021-08-16T12:52:42Z) - A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z) - Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate
Single-Shot Object Detection [22.817918566911203]
This paper proposes the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN) for fast and accurate single-shot object detection.
The proposed network achieves state-of-the-art performance on the UAVDT17 and MS COCO datasets.
arXiv Detail & Related papers (2020-12-03T06:51:20Z) - Dynamic Feature Pyramid Networks for Object Detection [40.24111664691307]
We introduce an inception FPN in which each layer contains convolution filters with different kernel sizes to enlarge the receptive field.
We propose a new dynamic FPN (DyFPN) which consists of multiple branches with different computational costs.
Experiments conducted on benchmarks demonstrate that the proposed DyFPN significantly improves performance with the optimal allocation of computation resources.
arXiv Detail & Related papers (2020-12-01T19:03:55Z) - FPCR-Net: Feature Pyramidal Correlation and Residual Reconstruction for
Optical Flow Estimation [72.41370576242116]
We propose a semi-supervised Feature Pyramidal Correlation and Residual Reconstruction Network (FPCR-Net) for optical flow estimation from frame pairs.
It consists of two main modules: pyramid correlation mapping and residual reconstruction.
Experiment results show that the proposed scheme achieves the state-of-the-art performance, with improvement by 0.80, 1.15 and 0.10 in terms of average end-point error (AEE) against competing baseline methods.
arXiv Detail & Related papers (2020-01-17T07:13:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.