Context Propagation from Proposals for Semantic Video Object Segmentation
- URL: http://arxiv.org/abs/2407.06247v1
- Date: Mon, 8 Jul 2024 14:44:18 GMT
- Title: Context Propagation from Proposals for Semantic Video Object Segmentation
- Authors: Tinghuai Wang,
- Abstract summary: We propose a novel approach to learning semantic contextual relationships in videos for semantic object segmentation.
Our proposals derives the semantic contexts from video object which encode the key evolution of objects and the relationship among objects over semantic-temporal domain.
- Score: 1.223779595809275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a novel approach to learning semantic contextual relationships in videos for semantic object segmentation. Our algorithm derives the semantic contexts from video object proposals which encode the key evolution of objects and the relationship among objects over the spatio-temporal domain. This semantic contexts are propagated across the video to estimate the pairwise contexts between all pairs of local superpixels which are integrated into a conditional random field in the form of pairwise potentials and infers the per-superpixel semantic labels. The experiments demonstrate that our contexts learning and propagation model effectively improves the robustness of resolving visual ambiguities in semantic video object segmentation compared with the state-of-the-art methods.
Related papers
- Context-Aware Temporal Embedding of Objects in Video Data [0.8287206589886881]
In video analysis, understanding the temporal context is crucial for recognizing object interactions, event patterns, and contextual changes over time.
The proposed model leverages adjacency and semantic similarities between objects from neighboring video frames to construct context-aware temporal object embeddings.
Empirical studies demonstrate that our context-aware temporal embeddings can be used in conjunction with conventional visual embeddings to enhance the effectiveness of downstream applications.
arXiv Detail & Related papers (2024-08-23T01:44:10Z) - Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation [1.4042211166197214]
We introduce an exemplar-based non-parametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarity graph of regions.
Our algorithm integrates the learned contexts into a Conditional Random Field (CRF) in the form of pairwise potentials and infers the per-region semantic labels.
We evaluate our approach on the challenging YouTube-Objects dataset which shows that the proposed contextual relationship model outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-08T13:22:13Z) - Submodular video object proposal selection for semantic object segmentation [1.223779595809275]
We learn a data-driven representation which captures the subset of multiple instances from continuous frames.
This selection process is formulated as a facility location problem solved by maximising a submodular function.
Our method retrieves the longer term contextual dependencies which underpins a robust semantic video object segmentation algorithm.
arXiv Detail & Related papers (2024-07-08T13:18:49Z) - Disentangling Content and Motion for Text-Based Neural Video
Manipulation [28.922000242744435]
We introduce a new method called DiCoMoGAN for manipulating videos with natural language.
Our evaluations demonstrate that DiCoMoGAN significantly outperforms existing frame-based methods.
arXiv Detail & Related papers (2022-11-05T21:49:41Z) - Multi-Modal Interaction Graph Convolutional Network for Temporal
Language Localization in Videos [55.52369116870822]
This paper focuses on tackling the problem of temporal language localization in videos.
It aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video.
arXiv Detail & Related papers (2021-10-12T14:59:25Z) - Contrastive Video-Language Segmentation [41.1635597261304]
We focus on the problem of segmenting a certain object referred by a natural language sentence in video content.
We propose to interwind the visual and linguistic modalities in an explicit way via the contrastive learning objective.
arXiv Detail & Related papers (2021-09-29T01:40:58Z) - Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts.
It hypothesizes that for objects with similar appearance, they share similar representation.
Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Affinity Space Adaptation for Semantic Segmentation Across Domains [57.31113934195595]
In this paper, we address the problem of unsupervised domain adaptation (UDA) in semantic segmentation.
Motivated by the fact that source and target domain have invariant semantic structures, we propose to exploit such invariance across domains.
We develop two affinity space adaptation strategies: affinity space cleaning and adversarial affinity space alignment.
arXiv Detail & Related papers (2020-09-26T10:28:11Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.