Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation
- URL: http://arxiv.org/abs/2407.05916v1
- Date: Mon, 8 Jul 2024 13:22:13 GMT
- Title: Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation
- Authors: Tinghuai Wang, Huiling Wang,
- Abstract summary: We introduce an exemplar-based non-parametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarity graph of regions.
Our algorithm integrates the learned contexts into a Conditional Random Field (CRF) in the form of pairwise potentials and infers the per-region semantic labels.
We evaluate our approach on the challenging YouTube-Objects dataset which shows that the proposed contextual relationship model outperforms the state-of-the-art methods.
- Score: 1.4042211166197214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach for modeling semantic contextual relationships in videos. This graph-based model enables the learning and propagation of higher-level spatial-temporal contexts to facilitate the semantic labeling of local regions. We introduce an exemplar-based nonparametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarity graph of regions. Contextual relationships learning and propagation are performed to estimate the pairwise contexts between all pairs of unlabeled local regions. Our algorithm integrates the learned contexts into a Conditional Random Field (CRF) in the form of pairwise potentials and infers the per-region semantic labels. We evaluate our approach on the challenging YouTube-Objects dataset which shows that the proposed contextual relationship model outperforms the state-of-the-art methods.
Related papers
- Context Propagation from Proposals for Semantic Video Object Segmentation [1.223779595809275]
We propose a novel approach to learning semantic contextual relationships in videos for semantic object segmentation.
Our proposals derives the semantic contexts from video object which encode the key evolution of objects and the relationship among objects over semantic-temporal domain.
arXiv Detail & Related papers (2024-07-08T14:44:18Z) - Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching [7.7559623054251]
Image-text matching (ITM) is a fundamental problem in computer vision.
We propose a Hybrid-modal feature the Interaction with multiple Enhancements (termed textitHire) for image-text matching.
In particular, the explicit intra-modal spatial-semantic graph-based reasoning network is designed to improve the contextual representation of visual objects.
arXiv Detail & Related papers (2024-06-05T13:10:55Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts.
It hypothesizes that for objects with similar appearance, they share similar representation.
Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z) - Imposing Relation Structure in Language-Model Embeddings Using
Contrastive Learning [30.00047118880045]
We propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure.
The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task.
arXiv Detail & Related papers (2021-09-02T10:58:27Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Contextual Modulation for Relation-Level Metaphor Identification [3.2619536457181075]
We introduce a novel architecture for identifying relation-level metaphoric expressions of certain grammatical relations.
In a methodology inspired by works in visual reasoning, our approach is based on conditioning the neural network computation on the deep contextualised features.
We demonstrate that the proposed architecture achieves state-of-the-art results on benchmark datasets.
arXiv Detail & Related papers (2020-10-12T12:07:02Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.