Mining Relations among Cross-Frame Affinities for Video Semantic
Segmentation
- URL: http://arxiv.org/abs/2207.10436v1
- Date: Thu, 21 Jul 2022 12:12:36 GMT
- Title: Mining Relations among Cross-Frame Affinities for Video Semantic
Segmentation
- Authors: Guolei Sun, Yun Liu, Hao Tang, Ajad Chhatkuli, Le Zhang, Luc Van Gool
- Abstract summary: We explore relations among affinities in two aspects: single-scale intrinsic correlations and multi-scale relations.
Our experiments demonstrate that the proposed method performs favorably against state-of-the-art VSS methods.
- Score: 87.4854250338374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The essence of video semantic segmentation (VSS) is how to leverage temporal
information for prediction. Previous efforts are mainly devoted to developing
new techniques to calculate the cross-frame affinities such as optical flow and
attention. Instead, this paper contributes from a different angle by mining
relations among cross-frame affinities, upon which better temporal information
aggregation could be achieved. We explore relations among affinities in two
aspects: single-scale intrinsic correlations and multi-scale relations.
Inspired by traditional feature processing, we propose Single-scale Affinity
Refinement (SAR) and Multi-scale Affinity Aggregation (MAA). To make it
feasible to execute MAA, we propose a Selective Token Masking (STM) strategy to
select a subset of consistent reference tokens for different scales when
calculating affinities, which also improves the efficiency of our method. At
last, the cross-frame affinities strengthened by SAR and MAA are adopted for
adaptively aggregating temporal information. Our experiments demonstrate that
the proposed method performs favorably against state-of-the-art VSS methods.
The code is publicly available at https://github.com/GuoleiSun/VSS-MRCFA
Related papers
- Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Morphologically-Aware Consensus Computation via Heuristics-based
IterATive Optimization (MACCHIatO) [1.8749305679160362]
We propose a new method to construct a binary or a probabilistic consensus segmentation based on the Fr'echet means of carefully chosen distances.
We show that it leads to binary consensus masks of intermediate size between Majority Voting and STAPLE and to different posterior probabilities than Mask Averaging and STAPLE methods.
arXiv Detail & Related papers (2023-09-14T23:28:58Z) - Hierarchical Dense Correlation Distillation for Few-Shot
Segmentation-Extended Abstract [47.85056124410376]
Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations.
We design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture.
We propose a matching module to reduce train-set overfitting and introduce correlation distillation leveraging semantic correspondence from coarse resolution to boost fine-grained segmentation.
arXiv Detail & Related papers (2023-06-27T08:10:20Z) - FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced
Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images.
We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity.
In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z) - Video Semantic Segmentation with Inter-Frame Feature Fusion and
Inner-Frame Feature Refinement [39.06589186472675]
We propose a spatial-temporal fusion (STF) module to model dense pairwise relationships among multi-frame features.
Besides, we propose a novel memory-augmented refinement (MAR) module to tackle difficult predictions among semantic boundaries.
arXiv Detail & Related papers (2023-01-10T07:57:05Z) - HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot
Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition.
The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations.
We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z) - Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations.
We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z) - Temporally-Consistent Surface Reconstruction using Metrically-Consistent
Atlases [131.50372468579067]
We propose a method for unsupervised reconstruction of a temporally-consistent sequence of surfaces from a sequence of time-evolving point clouds.
We represent the reconstructed surfaces as atlases computed by a neural network, which enables us to establish correspondences between frames.
Our approach outperforms state-of-the-art ones on several challenging datasets.
arXiv Detail & Related papers (2021-11-12T17:48:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.