Bilateral Reference for High-Resolution Dichotomous Image Segmentation
- URL: http://arxiv.org/abs/2401.03407v6
- Date: Wed, 24 Jul 2024 08:27:47 GMT
- Title: Bilateral Reference for High-Resolution Dichotomous Image Segmentation
- Authors: Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe,
- Abstract summary: We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef)
Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference.
- Score: 109.35828258964557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at https://github.com/ZhengPeng7/BiRefNet.
Related papers
- Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Optimal Transport Aggregation for Visual Place Recognition [9.192660643226372]
We introduce SALAD, which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem.
In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative.
Our single-stage method surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost.
arXiv Detail & Related papers (2023-11-27T15:46:19Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Referring Image Segmentation Using Text Supervision [44.27304699305985]
Existing Referring Image (RIS) methods typically require expensive pixel-level or box-level annotations for supervision.
We propose a novel weakly-supervised RIS framework to formulate the target localization problem as a classification process.
Our framework achieves promising performances to existing fully-supervised RIS methods while outperforming state-of-the-art weakly-supervised methods adapted from related areas.
arXiv Detail & Related papers (2023-08-28T13:40:47Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global
Conditional Networks [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
We propose a Cross-Reference and Local-Global Networks (CRCNet) for few-shot segmentation.
Our network can better find the co-occurrent objects in the two images with a cross-reference mechanism.
arXiv Detail & Related papers (2022-08-23T06:46:18Z) - Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images.
We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels.
Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z) - DRBANET: A Lightweight Dual-Resolution Network for Semantic Segmentation
with Boundary Auxiliary [15.729067807920236]
This paper introduces a lightweight dual-resolution network, called DRBANet, aiming to refine semantic segmentation results with the aid of boundary information.
DRBANet adopts dual parallel architecture, including: high resolution branch (HRB) and low resolution branch (LRB)
Experiments on Cityscapes and CamVid datasets demonstrate that our method achieves promising trade-off between segmentation accuracy and running efficiency.
arXiv Detail & Related papers (2021-10-31T14:20:02Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.