CFNet: Learning Correlation Functions for One-Stage Panoptic
Segmentation
- URL: http://arxiv.org/abs/2201.04796v1
- Date: Thu, 13 Jan 2022 05:31:14 GMT
- Title: CFNet: Learning Correlation Functions for One-Stage Panoptic
Segmentation
- Authors: Yifeng Chen, Wenqing Chu, Fangfang Wang, Ying Tai, Ran Yi, Zhenye Gan,
Liang Yao, Chengjie Wang, Xi Li
- Abstract summary: We propose to first predict semantic-level and instance-level correlations among different locations that are utilized to enhance the backbone features.
We then feed the improved discriminative features into the corresponding segmentation heads, respectively.
We achieve state-of-the-art performance on MS with $45.1$% PQ and ADE20k with $32.6$% PQ.
- Score: 46.252118473248316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there is growing attention on one-stage panoptic segmentation
methods which aim to segment instances and stuff jointly within a fully
convolutional pipeline efficiently. However, most of the existing works
directly feed the backbone features to various segmentation heads ignoring the
demands for semantic and instance segmentation are different: The former needs
semantic-level discriminative features, while the latter requires features to
be distinguishable across instances. To alleviate this, we propose to first
predict semantic-level and instance-level correlations among different
locations that are utilized to enhance the backbone features, and then feed the
improved discriminative features into the corresponding segmentation heads,
respectively. Specifically, we organize the correlations between a given
location and all locations as a continuous sequence and predict it as a whole.
Considering that such a sequence can be extremely complicated, we adopt
Discrete Fourier Transform (DFT), a tool that can approximate an arbitrary
sequence parameterized by amplitudes and phrases. For different tasks, we
generate these parameters from the backbone features in a fully convolutional
way which is optimized implicitly by corresponding tasks. As a result, these
accurate and consistent correlations contribute to producing plausible
discriminative features which meet the requirements of the complicated panoptic
segmentation task. To verify the effectiveness of our methods, we conduct
experiments on several challenging panoptic segmentation datasets and achieve
state-of-the-art performance on MS COCO with $45.1$\% PQ and ADE20k with
$32.6$\% PQ.
Related papers
- CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition [10.045163723630159]
CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies.
Our approach seamlessly adapts to single-entity backbones and boosts their performance in multi-entity scenarios.
arXiv Detail & Related papers (2024-10-09T17:55:43Z) - A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Category Feature Transformer for Semantic Segmentation [34.812688388968525]
CFT learns unified feature embeddings for individual semantic categories from high-level features during each aggregation process.
We conduct extensive experiments on popular semantic segmentation benchmarks.
The proposed CFT obtains a compelling 55.1% mIoU with greatly reduced model parameters and computations on the challenging ADE20K dataset.
arXiv Detail & Related papers (2023-08-10T13:44:54Z) - Contrastive Conditional Neural Processes [45.70735205041254]
Conditional Neural Processes(CNPs) bridge neural networks with probabilistic inference to approximate functions of Processes under meta-learning settings.
Two auxiliary contrastive branches are set up hierarchically, namely in-instantiation temporal contrastive learning(tt TCL) and cross-instantiation function contrastive learning(tt FCL)
We empirically show that tt TCL captures high-level abstraction of observations, whereas tt FCL helps identify underlying functions, which in turn provides more efficient representations.
arXiv Detail & Related papers (2022-03-08T10:08:45Z) - GaitStrip: Gait Recognition via Effective Strip-based Feature
Representations and Multi-Level Framework [34.397404430838286]
We present a strip-based multi-level gait recognition network, named GaitStrip, to extract comprehensive gait information at different levels.
To be specific, our high-level branch explores the context of gait sequences and our low-level one focuses on detailed posture changes.
Our GaitStrip achieves state-of-the-art performance in both normal walking and complex conditions.
arXiv Detail & Related papers (2022-03-08T09:49:48Z) - Assessing Data Efficiency in Task-Oriented Semantic Parsing [54.87705549021248]
We introduce a four-stage protocol which gives an approximate measure of how much in-domain "target" data a requires to achieve a certain quality bar.
We apply our protocol in two real-world case studies illustrating its flexibility and applicability to practitioners in task-oriented semantic parsing.
arXiv Detail & Related papers (2021-07-10T02:43:16Z) - Unsupervised segmentation via semantic-apparent feature fusion [21.75371777263847]
This research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF)
Key regions of foreground object can be accurately responded via semantic features, while apparent features provide richer detailed expression.
By fusing semantic and apparent features, as well as cascading the modules of intra-image adaptive feature weight learning and inter-image common feature learning, the research achieves performance that significantly exceeds baselines.
arXiv Detail & Related papers (2020-05-21T08:28:49Z) - Self-Supervised Tuning for Few-Shot Segmentation [82.32143982269892]
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space.
This paper presents an adaptive framework tuning, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme.
arXiv Detail & Related papers (2020-04-12T03:53:53Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.