TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding
- URL: http://arxiv.org/abs/2411.10509v2
- Date: Sun, 02 Mar 2025 18:17:14 GMT
- Title: TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding
- Authors: Quang P. M. Pham, Khoi T. N. Nguyen, Lan C. Ngo, Truong Do, Dezhen Song, Truong-Son Hy,
- Abstract summary: We propose Temporal Equivariant Scene Graph Neural Network (TESGNN), consisting of two key components.<n>ESGNN extracts information from 3D point clouds to generate scene graph while preserving crucial symmetry properties.<n>We show that leveraging the symmetry-preserving property produces a more stable and accurate global scene representation.
- Score: 8.32401190051443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene graphs have proven to be highly effective for various scene understanding tasks due to their compact and explicit representation of relational information. However, current methods often overlook the critical importance of preserving symmetry when generating scene graphs from 3D point clouds, which can lead to reduced accuracy and robustness, particularly when dealing with noisy, multi-view data. Furthermore, a major limitation of prior approaches is the lack of temporal modeling to capture time-dependent relationships among dynamically evolving entities in a scene. To address these challenges, we propose Temporal Equivariant Scene Graph Neural Network (TESGNN), consisting of two key components: (1) an Equivariant Scene Graph Neural Network (ESGNN), which extracts information from 3D point clouds to generate scene graph while preserving crucial symmetry properties, and (2) a Temporal Graph Matching Network, which fuses scene graphs generated by ESGNN across multiple time sequences into a unified global representation using an approximate graph-matching algorithm. Our combined architecture TESGNN outperforms current state-of-the-art methods in scene graph generation, achieving higher accuracy and faster training convergence. Moreover, we show that leveraging the symmetry-preserving property produces a more stable and accurate global scene representation compared to existing approaches. Last but not least, it is computationally efficient and easily implementable using existing frameworks, making it well-suited for real-time applications in robotics and computer vision. This approach paves the way for more robust and scalable solutions to complex multi-view scene understanding challenges. Our source code is publicly available at: https://github.com/HySonLab/TESGraph
Related papers
- Salient Temporal Encoding for Dynamic Scene Graph Generation [0.765514655133894]
We propose a novel spatial-temporal scene graph generation method that selectively builds temporal connections only between temporal-relevant objects pairs.
The resulting sparse and explicit temporal representation allows us to improve upon strong scene graph generation baselines by up to $4.4%$ in Scene Graph Detection.
arXiv Detail & Related papers (2025-03-15T08:01:36Z) - Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding [2.5165775267615205]
This work is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding.
Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence.
arXiv Detail & Related papers (2024-06-30T06:58:04Z) - TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - Multi-Scene Generalized Trajectory Global Graph Solver with Composite
Nodes for Multiple Object Tracking [61.69892497726235]
Composite Node Message Passing Network (CoNo-Link) is a framework for modeling ultra-long frames information for association.
In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction.
Our model can learn better predictions on longer-time scales by adding composite nodes.
arXiv Detail & Related papers (2023-12-14T14:00:30Z) - Mending of Spatio-Temporal Dependencies in Block Adjacency Matrix [3.529869282529924]
We propose a novel end-to-end learning architecture designed to mend the temporal dependencies, resulting in a well-connected graph.
Our methodology demonstrates superior performance on benchmark datasets, such as SurgVisDom and C2D2.
arXiv Detail & Related papers (2023-10-04T06:42:33Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - From random-walks to graph-sprints: a low-latency node embedding
framework on continuous-time dynamic graphs [4.372841335228306]
We propose a framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models.
In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges.
We demonstrate that our graph-sprints features, combined with a machine learning, achieve competitive performance.
arXiv Detail & Related papers (2023-07-17T12:25:52Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z) - Temporal Aggregation and Propagation Graph Neural Networks for Dynamic
Representation [67.26422477327179]
Temporal graphs exhibit dynamic interactions between nodes over continuous time.
We propose a novel method of temporal graph convolution with the whole neighborhood.
Our proposed TAP-GNN outperforms existing temporal graph methods by a large margin in terms of both predictive performance and online inference latency.
arXiv Detail & Related papers (2023-04-15T08:17:18Z) - TodyNet: Temporal Dynamic Graph Neural Network for Multivariate Time
Series Classification [6.76723360505692]
We propose a novel temporal dynamic neural graph network (TodyNet) that can extract hidden-temporal dependencies without undefined graph structure.
The experiments on 26 UEA benchmark datasets illustrate that the proposed TodyNet outperforms existing deep learning-based methods in the MTSC tasks.
arXiv Detail & Related papers (2023-04-11T09:21:28Z) - Self-Supervised Temporal Graph learning with Temporal and Structural Intensity Alignment [53.72873672076391]
Temporal graph learning aims to generate high-quality representations for graph-based tasks with dynamic information.
We propose a self-supervised method called S2T for temporal graph learning, which extracts both temporal and structural information.
S2T achieves at most 10.13% performance improvement compared with the state-of-the-art competitors on several datasets.
arXiv Detail & Related papers (2023-02-15T06:36:04Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Efficient-Dyn: Dynamic Graph Representation Learning via Event-based
Temporal Sparse Attention Network [2.0047096160313456]
Dynamic graph neural networks have received more and more attention from researchers.
We propose a novel dynamic graph neural network, Efficient-Dyn.
It adaptively encodes temporal information into a sequence of patches with an equal amount of temporal-topological structure.
arXiv Detail & Related papers (2022-01-04T23:52:24Z) - SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D
Sequences [76.28527350263012]
We propose a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames.
We aggregate PointNet features from primitive scene components by means of a graph neural network.
Our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
arXiv Detail & Related papers (2021-03-27T13:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.