Summarize and Search: Learning Consensus-aware Dynamic Convolution for
Co-Saliency Detection
- URL: http://arxiv.org/abs/2110.00338v1
- Date: Fri, 1 Oct 2021 12:06:42 GMT
- Title: Summarize and Search: Learning Consensus-aware Dynamic Convolution for
Co-Saliency Detection
- Authors: Ni Zhang and Junwei Han and Nian Liu and Ling Shao
- Abstract summary: Humans perform co-saliency detection by first summarizing the consensus knowledge in the whole group and then searching corresponding objects in each image.
Previous methods usually lack robustness, scalability, or stability for the first process and simply fuse consensus features with image features for the second process.
We propose a novel consensus-aware dynamic convolution model to explicitly and effectively perform the "summarize and search" process.
- Score: 139.10628924049476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans perform co-saliency detection by first summarizing the consensus
knowledge in the whole group and then searching corresponding objects in each
image. Previous methods usually lack robustness, scalability, or stability for
the first process and simply fuse consensus features with image features for
the second process. In this paper, we propose a novel consensus-aware dynamic
convolution model to explicitly and effectively perform the "summarize and
search" process. To summarize consensus image features, we first summarize
robust features for every single image using an effective pooling method and
then aggregate cross-image consensus cues via the self-attention mechanism. By
doing this, our model meets the scalability and stability requirements. Next,
we generate dynamic kernels from consensus features to encode the summarized
consensus knowledge. Two kinds of kernels are generated in a supplementary way
to summarize fine-grained image-specific consensus object cues and the coarse
group-wise common knowledge, respectively. Then, we can effectively perform
object searching by employing dynamic convolution at multiple scales. Besides,
a novel and effective data synthesis method is also proposed to train our
network. Experimental results on four benchmark datasets verify the
effectiveness of our proposed method. Our code and saliency maps are available
at \url{https://github.com/nnizhang/CADC}.
Related papers
- NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation [2.2559617939136505]
We propose a simple and training-free method to enhance the validity and robustness of the matching strategy.
The core concept involves randomly dropping feature channels (setting them to zero) during the matching process.
This technique mimics discarding pathological nubbles, and it can be seamlessly applied to other similarity computing scenarios.
arXiv Detail & Related papers (2024-05-19T08:00:38Z) - Towards Consistent Object Detection via LiDAR-Camera Synergy [17.665362927472973]
There is no existing model capable of detecting an object's position in both point clouds and images.
This paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework.
To assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision.
arXiv Detail & Related papers (2024-05-02T13:04:26Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Instance and Pair-Aware Dynamic Networks for Re-Identification [16.32740680438257]
Re-identification (ReID) is to identify the same instance across different cameras.
We propose a novel end-to-end trainable dynamic convolution framework named Instance and Pair-Aware Dynamic Networks.
In some datasets our algorithm outperforms state-of-the-art methods and in others, our algorithm achieves a comparable performance.
arXiv Detail & Related papers (2021-03-09T12:34:41Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.