NL-FCOS: Improving FCOS through Non-Local Modules for Object Detection
- URL: http://arxiv.org/abs/2203.15638v1
- Date: Tue, 29 Mar 2022 15:00:14 GMT
- Title: NL-FCOS: Improving FCOS through Non-Local Modules for Object Detection
- Authors: Lukas Pavez, Jose M. Saavedra Rondo
- Abstract summary: We show that non-local modules combined with an FCOS head (NL-FCOS) are practical and efficient.
We establish state-of-the-art performance in clothing detection and handwritten amount recognition problems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During the last years, we have seen significant advances in the object
detection task, mainly due to the outperforming results of convolutional neural
networks. In this vein, anchor-based models have achieved the best results.
However, these models require prior information about the aspect and scales of
target objects, needing more hyperparameters to fit. In addition, using anchors
to fit bounding boxes seems far from how our visual system does the same visual
task. Instead, our visual system uses the interactions of different scene parts
to semantically identify objects, called perceptual grouping. An object
detection methodology closer to the natural model is anchor-free detection,
where models like FCOS or Centernet have shown competitive results, but these
have not yet exploited the concept of perceptual grouping. Therefore, to
increase the effectiveness of anchor-free models keeping the inference time
low, we propose to add non-local attention (NL modules) modules to boost the
feature map of the underlying backbone. NL modules implement the perceptual
grouping mechanism, allowing receptive fields to cooperate in visual
representation learning. We show that non-local modules combined with an FCOS
head (NL-FCOS) are practical and efficient. Thus, we establish state-of-the-art
performance in clothing detection and handwritten amount recognition problems.
Related papers
- Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context [3.061662434597098]
We propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model.
The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks.
Our experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.
arXiv Detail & Related papers (2024-09-17T10:08:37Z) - Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Unveiling Camouflage: A Learnable Fourier-based Augmentation for
Camouflaged Object Detection and Instance Segmentation [27.41886911999097]
We propose a learnable augmentation method for camouflaged object detection (COD) and camouflaged instance segmentation (CIS)
Our proposed augmentation method boosts the performance of camouflaged object detectors and camouflaged instance segmenters by large margins.
arXiv Detail & Related papers (2023-08-29T22:43:46Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Feature Aggregation and Propagation Network for Camouflaged Object
Detection [42.33180748293329]
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment.
Several COD methods have been developed, but they still suffer from unsatisfactory performance due to intrinsic similarities between foreground objects and background surroundings.
We propose a novel Feature Aggregation and propagation Network (FAP-Net) for camouflaged object detection.
arXiv Detail & Related papers (2022-12-02T05:54:28Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Learning Target-aware Representation for Visual Tracking via Informative
Interactions [49.552877881662475]
We introduce a novel backbone architecture to improve target-perception ability of feature representation for tracking.
The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer.
arXiv Detail & Related papers (2022-01-07T16:22:27Z) - TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs.
Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Joint Object Detection and Multi-Object Tracking with Graph Neural
Networks [32.1359455541169]
We propose a new instance of joint MOT approach based on Graph Neural Networks (GNNs)
We show the effectiveness of our GNN-based joint MOT approach and show state-of-the-art performance for both detection and MOT tasks.
arXiv Detail & Related papers (2020-06-23T17:07:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.