Towards Open-World Co-Salient Object Detection with Generative
Uncertainty-aware Group Selective Exchange-Masking
- URL: http://arxiv.org/abs/2310.10264v1
- Date: Mon, 16 Oct 2023 10:40:40 GMT
- Title: Towards Open-World Co-Salient Object Detection with Generative
Uncertainty-aware Group Selective Exchange-Masking
- Authors: Yang Wu, Shenglong Hu, Huihui Song, Kaihua Zhang, Bo Liu, Dong Liu
- Abstract summary: We introduce a group selective exchange-masking (GSEM) approach for enhancing the robustness of the CoSOD model.
GSEM selects a subset of images from each group using a novel learning-based strategy, then the selected images are exchanged.
To simultaneously consider the uncertainty introduced by irrelevant images and the consensus features of the remaining relevant images in the group, we designed a latent variable generator branch and CoSOD transformer branch.
- Score: 23.60044777118441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The traditional definition of co-salient object detection (CoSOD) task is to
segment the common salient objects in a group of relevant images. This
definition is based on an assumption of group consensus consistency that is not
always reasonable in the open-world setting, which results in robustness issue
in the model when dealing with irrelevant images in the inputting image group
under the open-word scenarios. To tackle this problem, we introduce a group
selective exchange-masking (GSEM) approach for enhancing the robustness of the
CoSOD model. GSEM takes two groups of images as input, each containing
different types of salient objects. Based on the mixed metric we designed, GSEM
selects a subset of images from each group using a novel learning-based
strategy, then the selected images are exchanged. To simultaneously consider
the uncertainty introduced by irrelevant images and the consensus features of
the remaining relevant images in the group, we designed a latent variable
generator branch and CoSOD transformer branch. The former is composed of a
vector quantised-variational autoencoder to generate stochastic global
variables that model uncertainty. The latter is designed to capture
correlation-based local features that include group consensus. Finally, the
outputs of the two branches are merged and passed to a transformer-based
decoder to generate robust predictions. Taking into account that there are
currently no benchmark datasets specifically designed for open-world scenarios,
we constructed three open-world benchmark datasets, namely OWCoSal, OWCoSOD,
and OWCoCA, based on existing datasets. By breaking the group-consistency
assumption, these datasets provide effective simulations of real-world
scenarios and can better evaluate the robustness and practicality of models.
Related papers
- Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Co-Salient Object Detection with Semantic-Level Consensus Extraction and
Dispersion [27.120768849942145]
Co-salient object detection aims to highlight the common salient object in each image.
We propose a hierarchical Transformer module for extracting semantic-level consensus.
A Transformer-based dispersion module takes into account the variation of the co-salient object in different scenes.
arXiv Detail & Related papers (2023-09-14T14:39:07Z) - Contrastive Grouping with Transformer for Referring Image Segmentation [23.276636282894582]
We propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer)
CGFormer explicitly captures object-level information via token-based querying and grouping strategy.
Experimental results demonstrate that CGFormer outperforms state-of-the-art methods in both segmentation and generalization settings consistently and significantly.
arXiv Detail & Related papers (2023-09-02T20:53:42Z) - Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo
Labeling and Multi-scale Feature Grouping [40.07070188661184]
Weakly-Supervised Concealed Object (WSCOS) aims to segment objects well blended with surrounding environments.
It is hard to distinguish concealed objects from the background due to the intrinsic similarity.
We propose a new WSCOS method to address these two challenges.
arXiv Detail & Related papers (2023-05-18T14:31:34Z) - Learning Semi-supervised Gaussian Mixture Models for Generalized
Category Discovery [36.01459228175808]
We propose an EM-like framework that alternates between representation learning and class number estimation.
We evaluate our framework on both generic image classification datasets and challenging fine-grained object recognition datasets.
arXiv Detail & Related papers (2023-05-10T13:47:38Z) - CLUSTSEG: Clustering for Universal Segmentation [56.58677563046506]
CLUSTSEG is a general, transformer-based framework for image segmentation.
It tackles different image segmentation tasks (i.e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme.
arXiv Detail & Related papers (2023-05-03T15:31:16Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.