Related papers: LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

URL: http://arxiv.org/abs/2112.05291v1
Date: Fri, 10 Dec 2021 01:48:40 GMT
Title: LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization
Authors: Zhiwei Chen, Changan Wang, Yabiao Wang, Guannan Jiang, Yunhang Shen, Ying Tai, Chengjie Wang, Wei Zhang, Liujuan Cao
Abstract summary: Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
Score: 38.376238216214524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. The convolution neural network (CNN) based techniques often result in highlighting the most discriminative part of objects while ignoring the entire object extent. Recently, the transformer architecture has been deployed to WSOL to capture the long-range feature dependencies with self-attention mechanism and multilayer perceptron structure. Nevertheless, transformers lack the locality inductive bias inherent to CNNs and therefore may deteriorate local feature details in WSOL. In this paper, we propose a novel framework built upon the transformer, termed LCTR (Local Continuity TRansformer), which targets at enhancing the local perception capability of global features among long-range feature dependencies. To this end, we propose a relational patch-attention module (RPAM), which considers cross-patch information on a global basis. We further design a cue digging module (CDM), which utilizes local features to guide the learning trend of the model for highlighting the weak local responses. Finally, comprehensive experiments are carried out on two widely used datasets, ie, CUB-200-2011 and ILSVRC, to verify the effectiveness of our method.

Related papers

Attention-Guided Multi-Scale Local Reconstruction for Point Clouds via Masked Autoencoder Self-Supervised Learning [9.390627399366833]
We introduce PointAMaLR, a novel self-supervised learning framework for point cloud processing.<n>PointAMaLR implements hierarchical reconstruction across multiple local regions.<n>Experiments on benchmark datasets demonstrate PointAMaLR's superior accuracy and quality in both classification and reconstruction tasks.
arXiv Detail & Related papers (2025-07-05T16:17:49Z)
United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images [21.76732661032257]
We propose a novel United Domain Cognition Network (UDCNet) to jointly explore the global-local information in the frequency and spatial domains. Experimental results demonstrate the superiority of the proposed UDCNet over 24 state-of-the-art models.
arXiv Detail & Related papers (2024-11-11T04:12:27Z)
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions. Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks. We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z)
Salient Object Detection in Optical Remote Sensing Images Driven by Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD) Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies. Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z)
Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization [31.039698757869974]
Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision. Previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope. We propose a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation.
arXiv Detail & Related papers (2023-09-04T03:20:31Z)
MOST: Multiple Object localization with Self-supervised Transformers for object discovery [97.47075050779085]
We present Multiple Object localization with Self-supervised Transformers (MOST) MOST uses features of transformers trained using self-supervised learning to localize multiple objects in real world images. We show MOST can be used for self-supervised pre-training of object detectors, and yields consistent improvements on fully, semi-supervised object detection and unsupervised region proposal generation.
arXiv Detail & Related papers (2023-04-11T17:57:27Z)
DQnet: Cross-Model Detail Querying for Camouflaged Object Detection [54.82390534024954]
A convolutional neural network (CNN) for camouflaged object detection tends to activate local discriminative regions while ignoring complete object extent. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN. In order to obtain feature maps that could activate full object extent, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed.
arXiv Detail & Related papers (2022-12-16T06:23:58Z)
LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions. We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers. The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z)
Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration [20.322494442959762]
Weakly Supervised Object Localization (WSOL) has attracted much attention because of its low annotation cost in real applications. We introduce a simple yet effective Spatial Module (SCM) for accurate WSOL, incorporating semantic similarities of patch tokens and their spatial relationships into a unified diffusion model. SCM is designed as an external module of Transformer, and can be removed during inference to reduce the computation cost.
arXiv Detail & Related papers (2022-07-21T12:37:15Z)
Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels. Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z)
LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects. We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.