Closing the Loop: Graph Networks to Unify Semantic Objects and Visual
Features for Multi-object Scenes
- URL: http://arxiv.org/abs/2209.11894v1
- Date: Sat, 24 Sep 2022 00:42:33 GMT
- Title: Closing the Loop: Graph Networks to Unify Semantic Objects and Visual
Features for Multi-object Scenes
- Authors: Jonathan J.Y. Kim, Martin Urschler, Patricia J. Riddle, J\"org S.
Wicker
- Abstract summary: Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places.
Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems.
This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically.
- Score: 2.236663830879273
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In Simultaneous Localization and Mapping (SLAM), Loop Closure Detection (LCD)
is essential to minimize drift when recognizing previously visited places.
Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many
state-of-the-art SLAM systems. It uses a set of visual features to provide
robust place recognition but fails to perceive the semantics or spatial
relationship between feature points. Previous work has mainly focused on
addressing these issues by combining vBoW with semantic and spatial information
from objects in the scene. However, they are unable to exploit spatial
information of local visual features and lack a structure that unifies semantic
objects and visual features, therefore limiting the symbiosis between the two
components. This paper proposes SymbioLCD2, which creates a unified graph
structure to integrate semantic objects and visual features symbiotically. Our
novel graph-based LCD system utilizes the unified graph structure by applying a
Weisfeiler-Lehman graph kernel with temporal constraints to robustly predict
loop closure candidates. Evaluation of the proposed system shows that having a
unified graph structure incorporating semantic objects and visual features
improves LCD prediction accuracy, illustrating that the proposed graph
structure provides a strong symbiosis between these two complementary
components. It also outperforms other Machine Learning algorithms - such as
SVM, Decision Tree, Random Forest, Neural Network and GNN based Graph Matching
Networks. Furthermore, it has shown good performance in detecting loop closure
candidates earlier than state-of-the-art SLAM systems, demonstrating that
extended semantic and spatial awareness from the unified graph structure
significantly impacts LCD performance.
Related papers
- DynamicGlue: Epipolar and Time-Informed Data Association in Dynamic Environments using Graph Neural Networks [13.42760841894735]
We propose a graph neural network-based sparse feature matching network to perform robust matching under challenging conditions.
We employ a similar scheme of attentional aggregation over graph edges to enhance keypoint representations as state-of-the-art feature-matching networks.
A series of experiments show the superior performance of our network as it excludes keypoints on moving objects compared to state-of-the-art feature matching networks.
arXiv Detail & Related papers (2024-03-17T23:23:40Z) - Dynamic Graph Representation with Knowledge-aware Attention for
Histopathology Whole Slide Image Analysis [11.353826466710398]
We propose a novel dynamic graph representation algorithm that conceptualizes WSIs as a form of the knowledge graph structure.
Specifically, we dynamically construct neighbors and directed edge embeddings based on the head and tail relationships between instances.
Our end-to-end graph representation learning approach has outperformed the state-of-the-art WSI analysis methods on three TCGA benchmark datasets and in-house test sets.
arXiv Detail & Related papers (2024-03-12T14:58:51Z) - Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal
Sentence Localization in Videos [67.12603318660689]
We propose a novel Hierarchical Visual- and Semantic-Aware Reasoning Network (HVSARN)
HVSARN enables both visual- and semantic-aware query reasoning from object-level to frame-level.
Experiments on three datasets demonstrate that our HVSARN achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2023-03-02T08:00:22Z) - Template based Graph Neural Network with Optimal Transport Distances [11.56532171513328]
Current Graph Neural Networks (GNN) architectures rely on two important components: node features embedding through message passing, and aggregation with a specialized form of pooling.
We propose in this work a novel point of view, which places distances to some learnable graph templates at the core of the graph representation.
This distance embedding is constructed thanks to an optimal transport distance: the Fused Gromov-Wasserstein (FGW) distance.
arXiv Detail & Related papers (2022-05-31T12:24:01Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted
Objects and Visual Bag-of-Words [2.924868086534434]
Loop closure detection is an essential tool of SLAM to minimize drift in its localization.
Many state-of-the-art loop closure detection algorithms use visual Bag-of-Words (vBoW)
We propose SymbioLCD, a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW features for LCD candidate prediction.
arXiv Detail & Related papers (2021-10-21T21:34:57Z) - Incremental Abstraction in Distributed Probabilistic SLAM Graphs [23.441820909790497]
Scene graphs represent the key components of a scene in a compact and semantically rich way.
We present a distributed, graph-based SLAM framework for incrementally building scene graphs.
arXiv Detail & Related papers (2021-09-13T18:16:36Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Learning Spatial Context with Graph Neural Network for Multi-Person Pose
Grouping [71.59494156155309]
Bottom-up approaches for image-based multi-person pose estimation consist of two stages: keypoint detection and grouping.
In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN)
The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association.
arXiv Detail & Related papers (2021-04-06T09:21:14Z) - Learning Physical Graph Representations from Visual Scenes [56.7938395379406]
Physical Scene Graphs (PSGs) represent scenes as hierarchical graphs with nodes corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
PSGNet augments standard CNNs by including: recurrent feedback connections to combine low and high-level image information; graph pooling and vectorization operations that convert spatially-uniform feature maps into object-centric graph structures.
We show that PSGNet outperforms alternative self-supervised scene representation algorithms at scene segmentation tasks.
arXiv Detail & Related papers (2020-06-22T16:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.