Related papers: Video Salient Object Detection via Adaptive Local-Global Refinement

Video Salient Object Detection via Adaptive Local-Global Refinement

URL: http://arxiv.org/abs/2104.14360v1
Date: Thu, 29 Apr 2021 14:14:11 GMT
Title: Video Salient Object Detection via Adaptive Local-Global Refinement
Authors: Yi Tang and Yuanman Li and Guoliang Xing
Abstract summary: Video salient object detection (VSOD) is an important task in many vision applications. We propose an adaptive local-global refinement framework for VSOD. We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation.
Score: 7.723369608197167
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video salient object detection (VSOD) is an important task in many vision applications. Reliable VSOD requires to simultaneously exploit the information from both the spatial domain and the temporal domain. Most of the existing algorithms merely utilize simple fusion strategies, such as addition and concatenation, to merge the information from different domains. Despite their simplicity, such fusion strategies may introduce feature redundancy, and also fail to fully exploit the relationship between multi-level features extracted from both spatial and temporal domains. In this paper, we suggest an adaptive local-global refinement framework for VSOD. Different from previous approaches, we propose a local refinement architecture and a global one to refine the simply fused features with different scopes, which can fully explore the local dependence and the global dependence of multi-level features. In addition, to emphasize the effective information and suppress the useless one, an adaptive weighting mechanism is designed based on graph convolutional neural network (GCN). We show that our weighting methodology can further exploit the feature correlations, thus driving the network to learn more discriminative feature representation. Extensive experimental results on public video datasets demonstrate the superiority of our method over the existing ones.

Related papers

Boosting Single-domain Generalized Object Detection via Vision-Language Knowledge Interaction [4.692621855184482]
Single-Domain Generalized Object Detection(S-DGOD) aims to train an object detector on a single source domain. Recent S-DGOD approaches exploit pre-trained vision-language knowledge to guide invariant feature learning across visual domains. We propose a new cross-modal feature learning method, which can capture generalized and discriminative regional features for S-DGOD tasks.
arXiv Detail & Related papers (2025-04-27T02:55:54Z)
Generalizable Deepfake Detection via Effective Local-Global Feature Extraction [5.221473306027505]
GANs and diffusion models have led to the generation of increasingly realistic fake images. Deepfake detection has become a pressing issue in today's world. We propose a novel method that effectively combines local and global features.
arXiv Detail & Related papers (2025-01-25T15:53:57Z)
Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff. By integrating pseudo-target domain data with source domain data, we diversify the training dataset. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z)
GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection [23.872633359324098]
We propose a novel Global-Local Collaborative Optimization Network, called GLCONet. In this paper, we first design a collaborative optimization strategy to simultaneously model the local details and global long-range relationships. Experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image.
arXiv Detail & Related papers (2024-09-15T02:26:17Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
CLIP the Gap: A Single Domain Generalization Approach for Object Detection [60.20931827772482]
Single Domain Generalization tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. We propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss.
arXiv Detail & Related papers (2023-01-13T12:01:18Z)
Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition. Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z)
Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification [2.294635424666456]
We propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features. The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T10:27:22Z)
Relation Matters: Foreground-aware Graph-based Relational Reasoning for Domain Adaptive Object Detection [81.07378219410182]
We propose a new and general framework for DomainD, named Foreground-aware Graph-based Reasoning (FGRR) FGRR incorporates graph structures into the detection pipeline to explicitly model the intra- and inter-domain foreground object relations. Empirical results demonstrate that the proposed FGRR exceeds the state-of-the-art on four DomainD benchmarks.
arXiv Detail & Related papers (2022-06-06T05:12:48Z)
Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels. Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z)
Channel-wise Alignment for Adaptive Object Detection [66.76486843397267]
Generic object detection has been immensely promoted by the development of deep convolutional neural networks. Existing methods on this task usually draw attention on the high-level alignment based on the whole image or object of interest. In this paper, we realize adaptation from a thoroughly different perspective, i.e., channel-wise alignment.
arXiv Detail & Related papers (2020-09-07T02:42:18Z)
Global Context-Aware Progressive Aggregation Network for Salient Object Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features. We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
Hybrid Multiple Attention Network for Semantic Segmentation in Aerial Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations. We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.