Video Salient Object Detection via Adaptive Local-Global Refinement
- URL: http://arxiv.org/abs/2104.14360v1
- Date: Thu, 29 Apr 2021 14:14:11 GMT
- Title: Video Salient Object Detection via Adaptive Local-Global Refinement
- Authors: Yi Tang and Yuanman Li and Guoliang Xing
- Abstract summary: Video salient object detection (VSOD) is an important task in many vision applications.
We propose an adaptive local-global refinement framework for VSOD.
We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
- Score: 7.723369608197167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video salient object detection (VSOD) is an important task in many vision
applications. Reliable VSOD requires to simultaneously exploit the information
from both the spatial domain and the temporal domain. Most of the existing
algorithms merely utilize simple fusion strategies, such as addition and
concatenation, to merge the information from different domains. Despite their
simplicity, such fusion strategies may introduce feature redundancy, and also
fail to fully exploit the relationship between multi-level features extracted
from both spatial and temporal domains. In this paper, we suggest an adaptive
local-global refinement framework for VSOD. Different from previous approaches,
we propose a local refinement architecture and a global one to refine the
simply fused features with different scopes, which can fully explore the local
dependence and the global dependence of multi-level features. In addition, to
emphasize the effective information and suppress the useless one, an adaptive
weighting mechanism is designed based on graph convolutional neural network
(GCN). We show that our weighting methodology can further exploit the feature
correlations, thus driving the network to learn more discriminative feature
representation. Extensive experimental results on public video datasets
demonstrate the superiority of our method over the existing ones.
Related papers
- GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection [23.872633359324098]
We propose a novel Global-Local Collaborative Optimization Network, called GLCONet.
In this paper, we first design a collaborative optimization strategy to simultaneously model the local details and global long-range relationships.
Experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image.
arXiv Detail & Related papers (2024-09-15T02:26:17Z) - Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - CLIP the Gap: A Single Domain Generalization Approach for Object
Detection [60.20931827772482]
Single Domain Generalization tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain.
We propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts.
We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss.
arXiv Detail & Related papers (2023-01-13T12:01:18Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Cross-modal Local Shortest Path and Global Enhancement for
Visible-Thermal Person Re-Identification [2.294635424666456]
We propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features.
The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T10:27:22Z) - Relation Matters: Foreground-aware Graph-based Relational Reasoning for
Domain Adaptive Object Detection [81.07378219410182]
We propose a new and general framework for DomainD, named Foreground-aware Graph-based Reasoning (FGRR)
FGRR incorporates graph structures into the detection pipeline to explicitly model the intra- and inter-domain foreground object relations.
Empirical results demonstrate that the proposed FGRR exceeds the state-of-the-art on four DomainD benchmarks.
arXiv Detail & Related papers (2022-06-06T05:12:48Z) - Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images.
We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels.
Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z) - Channel-wise Alignment for Adaptive Object Detection [66.76486843397267]
Generic object detection has been immensely promoted by the development of deep convolutional neural networks.
Existing methods on this task usually draw attention on the high-level alignment based on the whole image or object of interest.
In this paper, we realize adaptation from a thoroughly different perspective, i.e., channel-wise alignment.
arXiv Detail & Related papers (2020-09-07T02:42:18Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.