Affinity-aware Compression and Expansion Network for Human Parsing
- URL: http://arxiv.org/abs/2008.10191v1
- Date: Mon, 24 Aug 2020 05:16:08 GMT
- Title: Affinity-aware Compression and Expansion Network for Human Parsing
- Authors: Xinyan Zhang, Yunfeng Wang, Pengfei Xiong
- Abstract summary: ACENet achieves new state-of-the-art performance on the challenging LIP and Pascal-Person-Part datasets.
58.1% mean IoU is achieved on the LIP benchmark.
- Score: 6.993481561132318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a fine-grained segmentation task, human parsing is still faced with two
challenges: inter-part indistinction and intra-part inconsistency, due to the
ambiguous definitions and confusing relationships between similar human parts.
To tackle these two problems, this paper proposes a novel
\textit{Affinity-aware Compression and Expansion} Network (ACENet), which
mainly consists of two modules: Local Compression Module (LCM) and Global
Expansion Module (GEM). Specifically, LCM compresses parts-correlation
information through structural skeleton points, obtained from an extra skeleton
branch. It can decrease the inter-part interference, and strengthen structural
relationships between ambiguous parts. Furthermore, GEM expands semantic
information of each part into a complete piece by incorporating the spatial
affinity with boundary guidance, which can effectively enhance the semantic
consistency of intra-part as well. ACENet achieves new state-of-the-art
performance on the challenging LIP and Pascal-Person-Part datasets. In
particular, 58.1% mean IoU is achieved on the LIP benchmark.
Related papers
- Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation [24.071471822239854]
Open-Vocabulary Part (OVPS) is an emerging field for recognizing fine-grained parts in unseen categories.
We identify two primary challenges in OVPS: the difficulty in aligning part-level image-text correspondence, and the lack of structural understanding in segmenting object parts.
We propose PartCATSeg, a novel framework that integrates object-aware part-level cost aggregation, compositional loss, and structural guidance from DINO.
arXiv Detail & Related papers (2025-01-16T17:40:19Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - SegNetr: Rethinking the local-global interactions and skip connections
in U-shaped networks [1.121518046252855]
U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure.
We introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity.
We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59% and 76% fewer parameters and GFLOPs than vanilla U-Net.
arXiv Detail & Related papers (2023-07-06T12:39:06Z) - SUNet: Scale-aware Unified Network for Panoptic Segmentation [25.626882426111198]
We propose two lightweight modules to mitigate the problem of segmenting objects of various scales.
We present an end-to-end Scale-aware Unified Network (SUNet) which is more adaptable to multi-scale objects.
arXiv Detail & Related papers (2022-09-07T01:40:41Z) - CI-Net: Contextual Information for Joint Semantic Segmentation and Depth
Estimation [2.8785764686013837]
We propose a network injected with contextual information (CI-Net) to solve the problem.
With supervision from semantic labels, the network is embedded with contextual information so that it could understand the scene better.
We evaluate the proposed CI-Net on the NYU-Depth-v2 and SUN-RGBD datasets.
arXiv Detail & Related papers (2021-07-29T07:58:25Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation
Operation for Semantic Segmentation [44.792859259093085]
We aim to enable the low-level feature to aggregate the complementary context from adjacent high-level feature maps by a cross-scale pixel-to-region operation.
We employ an efficient feature pyramid network to obtain multi-scale features.
Experiment results show that the RSP head performs competitively on both semantic segmentation and panoptic segmentation with high efficiency.
arXiv Detail & Related papers (2021-06-03T10:49:48Z) - CTNet: Context-based Tandem Network for Semantic Segmentation [77.4337867789772]
This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual information.
To further improve the performance of the learned representations for semantic segmentation, the results of the two context modules are adaptively integrated.
arXiv Detail & Related papers (2021-04-20T07:33:11Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.