IDRNet: Intervention-Driven Relation Network for Semantic Segmentation
- URL: http://arxiv.org/abs/2310.10755v1
- Date: Mon, 16 Oct 2023 18:37:33 GMT
- Title: IDRNet: Intervention-Driven Relation Network for Semantic Segmentation
- Authors: Zhenchao Jin, Xiaowei Hu, Lingting Zhu, Luchuan Song, Li Yuan and
Lequan Yu
- Abstract summary: Co-occurrent visual patterns suggest that pixel relation modeling facilitates dense prediction tasks.
Despite the impressive results, existing paradigms often suffer from inadequate or ineffective contextual information aggregation.
We propose a novel textbfIntervention-textbfDriven textbfRelation textbfNetwork.
- Score: 34.09179171102469
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Co-occurrent visual patterns suggest that pixel relation modeling facilitates
dense prediction tasks, which inspires the development of numerous context
modeling paradigms, \emph{e.g.}, multi-scale-driven and similarity-driven
context schemes. Despite the impressive results, these existing paradigms often
suffer from inadequate or ineffective contextual information aggregation due to
reliance on large amounts of predetermined priors. To alleviate the issues, we
propose a novel \textbf{I}ntervention-\textbf{D}riven \textbf{R}elation
\textbf{Net}work (\textbf{IDRNet}), which leverages a deletion diagnostics
procedure to guide the modeling of contextual relations among different pixels.
Specifically, we first group pixel-level representations into semantic-level
representations with the guidance of pseudo labels and further improve the
distinguishability of the grouped representations with a feature enhancement
module. Next, a deletion diagnostics procedure is conducted to model relations
of these semantic-level representations via perceiving the network outputs and
the extracted relations are utilized to guide the semantic-level
representations to interact with each other. Finally, the interacted
representations are utilized to augment original pixel-level representations
for final predictions. Extensive experiments are conducted to validate the
effectiveness of IDRNet quantitatively and qualitatively. Notably, our
intervention-driven context scheme brings consistent performance improvements
to state-of-the-art segmentation frameworks and achieves competitive results on
popular benchmark datasets, including ADE20K, COCO-Stuff, PASCAL-Context, LIP,
and Cityscapes. Code is available at
\url{https://github.com/SegmentationBLWX/sssegmentation}.
Related papers
- Unleashing the Potential of Text-attributed Graphs: Automatic Relation Decomposition via Large Language Models [31.443478448031886]
RoSE (Relation-oriented Semantic Edge-decomposition) is a novel framework that decomposes the graph structure by analyzing raw text attributes.
Our framework significantly enhances node classification performance across various datasets, with improvements of up to 16% on the Wisconsin dataset.
arXiv Detail & Related papers (2024-05-28T20:54:47Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - SCG-Net: Self-Constructing Graph Neural Networks for Semantic
Segmentation [23.623276007011373]
We propose a module that learns a long-range dependency graph directly from the image and uses it to propagate contextual information efficiently.
The module is optimised via a novel adaptive diagonal enhancement method and a variational lower bound.
When incorporated into a neural network (SCG-Net), semantic segmentation is performed in an end-to-end manner and competitive performance.
arXiv Detail & Related papers (2020-09-03T12:13:09Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.