NICE: Improving Panoptic Narrative Detection and Segmentation with
Cascading Collaborative Learning
- URL: http://arxiv.org/abs/2310.10975v2
- Date: Mon, 23 Oct 2023 10:34:34 GMT
- Title: NICE: Improving Panoptic Narrative Detection and Segmentation with
Cascading Collaborative Learning
- Authors: Haowei Wang, Jiayi Ji, Tianyu Guo, Yilong Yang, Yiyi Zhou, Xiaoshuai
Sun, Rongrong Ji
- Abstract summary: We propose a unified framework called NICE that can jointly learn two panoptic narrative recognition tasks.
By linking PNS and PND in series with the barycenter of segmentation as the anchor, our approach naturally aligns the two tasks.
NICE surpasses all existing methods by a large margin, achieving 4.1% for PND and 2.9% for PNS over the state-of-the-art.
- Score: 77.95710025273218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic Narrative Detection (PND) and Segmentation (PNS) are two challenging
tasks that involve identifying and locating multiple targets in an image
according to a long narrative description. In this paper, we propose a unified
and effective framework called NICE that can jointly learn these two panoptic
narrative recognition tasks. Existing visual grounding tasks use a two-branch
paradigm, but applying this directly to PND and PNS can result in prediction
conflict due to their intrinsic many-to-many alignment property. To address
this, we introduce two cascading modules based on the barycenter of the mask,
which are Coordinate Guided Aggregation (CGA) and Barycenter Driven
Localization (BDL), responsible for segmentation and detection, respectively.
By linking PNS and PND in series with the barycenter of segmentation as the
anchor, our approach naturally aligns the two tasks and allows them to
complement each other for improved performance. Specifically, CGA provides the
barycenter as a reference for detection, reducing BDL's reliance on a large
number of candidate boxes. BDL leverages its excellent properties to
distinguish different instances, which improves the performance of CGA for
segmentation. Extensive experiments demonstrate that NICE surpasses all
existing methods by a large margin, achieving 4.1% for PND and 2.9% for PNS
over the state-of-the-art. These results validate the effectiveness of our
proposed collaborative learning strategy. The project of this work is made
publicly available at https://github.com/Mr-Neko/NICE.
Related papers
- Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - No-Service Rail Surface Defect Segmentation via Normalized Attention and
Dual-scale Interaction [13.150295919228013]
No-service rail surface defect (NRSD) segmentation is an essential way for perceiving the quality of no-service rails.
Existing natural image segmentation methods cannot achieve promising performance in NRSD images.
We propose a novel segmentation network for NRSDs based on Normalized Attention and Dual-scale Interaction, named NaDiNet.
arXiv Detail & Related papers (2023-06-27T12:58:16Z) - Discriminative Co-Saliency and Background Mining Transformer for
Co-Salient Object Detection [111.04994415248736]
We propose a Discriminative co-saliency and background Mining Transformer framework (DMT)
We use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules.
Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-30T15:56:47Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - OS-MSL: One Stage Multimodal Sequential Link Framework for Scene
Segmentation and Classification [11.707994658605546]
We propose a general One Stage Multimodal Sequential Link Framework (OS-MSL) to distinguish and leverage the two-fold semantics.
We tailor a specific module called DiffCorrNet to explicitly extract the information of differences and correlations among shots.
arXiv Detail & Related papers (2022-07-04T07:59:34Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic
Segmentation [35.11139361684248]
We propose a Collaborative Panoptic-Regional Active Learning framework (CPRAL) to address the semantic segmentation task.
Considering the class imbalance in the segmentation dataset, we import a Regional Gaussian Attention module (RGA) to achieve semantics-biased selection.
We show that CPRAL outperforms the cutting-edge methods with impressive results and less labeling proportion.
arXiv Detail & Related papers (2021-12-11T13:13:13Z) - Dual-Attention Enhanced BDense-UNet for Liver Lesion Segmentation [3.1667381240856987]
We propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet.
DenseUNet allows learning enough diverse features and enhancing the representative power of networks by regulating the information flow.
arXiv Detail & Related papers (2021-07-24T16:28:00Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.