CoSformer: Detecting Co-Salient Object with Transformers
- URL: http://arxiv.org/abs/2104.14729v1
- Date: Fri, 30 Apr 2021 02:39:12 GMT
- Title: CoSformer: Detecting Co-Salient Object with Transformers
- Authors: Lv Tang
- Abstract summary: Co-Salient Object Detection (CoSOD) aims at simulating the human visual system to discover the common and salient objects from a group of relevant images.
We propose the Co-Salient Object Detection Transformer (CoSformer) network to capture both salient and common visual patterns from multiple images.
- Score: 2.3148470932285665
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Co-Salient Object Detection (CoSOD) aims at simulating the human visual
system to discover the common and salient objects from a group of relevant
images. Recent methods typically develop sophisticated deep learning based
models have greatly improved the performance of CoSOD task. But there are still
two major drawbacks that need to be further addressed, 1) sub-optimal
inter-image relationship modeling; 2) lacking consideration of inter-image
separability. In this paper, we propose the Co-Salient Object Detection
Transformer (CoSformer) network to capture both salient and common visual
patterns from multiple images. By leveraging Transformer architecture, the
proposed method address the influence of the input orders and greatly improve
the stability of the CoSOD task. We also introduce a novel concept of
inter-image separability. We construct a contrast learning scheme to modeling
the inter-image separability and learn more discriminative embedding space to
distinguish true common objects from noisy objects. Extensive experiments on
three challenging benchmarks, i.e., CoCA, CoSOD3k, and Cosal2015, demonstrate
that our CoSformer outperforms cutting-edge models and achieves the new
state-of-the-art. We hope that CoSformer can motivate future research for more
visual co-analysis tasks.
Related papers
- A Simple yet Effective Network based on Vision Transformer for
Camouflaged Object and Salient Object Detection [33.30644598646274]
We propose a simple yet effective network (SENet) based on vision Transformer (ViT)
To enhance the Transformer's ability to model local information, we propose a local information capture module (LICM)
We also propose a dynamic weighted loss (DW loss) based on Binary Cross-Entropy (BCE) and Intersection over Union (IoU) loss, which guides the network to pay more attention to those smaller and more difficult-to-find target objects.
arXiv Detail & Related papers (2024-02-29T07:29:28Z) - Towards a Unified Transformer-based Framework for Scene Graph Generation
and Human-object Interaction Detection [116.21529970404653]
We introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
Our approach employs two interactive hierarchical Transformers to seamlessly unify the tasks of SGG and HOI detection.
Our approach achieves competitive performance when compared to state-of-the-art HOI methods.
arXiv Detail & Related papers (2023-11-03T07:25:57Z) - CroCo v2: Improved Cross-view Completion Pre-training for Stereo
Matching and Optical Flow [22.161967080759993]
Self-supervised pre-training methods have not yet delivered on dense geometric vision tasks such as stereo matching or optical flow.
We build on the recent cross-view completion framework, a variation of masked image modeling that leverages a second view from the same scene.
We show for the first time that state-of-the-art results on stereo matching and optical flow can be reached without using any classical task-specific techniques.
arXiv Detail & Related papers (2022-11-18T18:18:53Z) - MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction
Detection [21.296007737406494]
Human-Object Interaction (HOI) detection is the task of identifying a set of human, object, interaction> triplets from an image.
Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection.
We propose a Multi-Scale TRansformer (MSTR) for HOI detection powered by two novel HOI-aware deformable attention modules.
arXiv Detail & Related papers (2022-03-28T12:58:59Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - Unsupervised Image Decomposition with Phase-Correlation Networks [28.502280038100167]
Phase-Correlation Decomposition Network (PCDNet) is a novel model that decomposes a scene into its object components.
In our experiments, we show how PCDNet outperforms state-of-the-art methods for unsupervised object discovery and segmentation.
arXiv Detail & Related papers (2021-10-07T13:57:33Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Re-thinking Co-Salient Object Detection [170.44471050548827]
Co-salient object detection (CoSOD) aims to detect the co-occurring salient objects in a group of images.
Existing CoSOD datasets often have a serious data bias, assuming that each group of images contains salient objects of similar visual appearances.
We introduce a new benchmark, called CoSOD3k in the wild, which requires a large amount of semantic context.
arXiv Detail & Related papers (2020-07-07T12:20:51Z) - Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images.
In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.