DDS: Decoupled Dynamic Scene-Graph Generation Network
- URL: http://arxiv.org/abs/2301.07666v2
- Date: Mon, 20 Jan 2025 18:51:08 GMT
- Title: DDS: Decoupled Dynamic Scene-Graph Generation Network
- Authors: A S M Iftekhar, Raphael Ruschel, Satish Kumar, Suya You, B. S. Manjunath,
- Abstract summary: Scene-graph generation involves creating a structural representation of the relationships between objects in a scene.<n>We propose a decoupled dynamic scenegraph generation network that consists of two independent branches that can disentangle extracted features.<n>The model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets.
- Score: 9.586462685772705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. Existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue, we propose DDS -- a decoupled dynamic scene-graph generation network -- that consists of two independent branches that can disentangle extracted features. The key innovation of the current paper is the decoupling of the features representing the relationships from those of the objects, which enables the detection of novel object-relationship combinations. The DDS model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets.
Related papers
- DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data [67.99373622902827]
DIPO is a framework for controllable generation of articulated 3D objects from a pair of images.<n>We propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters.<n>We propose PM-X, a large-scale dataset of complex articulated 3D objects, accompanied by rendered images, URDF annotations, and textual descriptions.
arXiv Detail & Related papers (2025-05-26T18:55:14Z) - Generalized Visual Relation Detection with Diffusion Models [94.62313788626128]
Visual relation detection (VRD) aims to identify relationships (or interactions) between object pairs in an image.
We propose to model visual relations as continuous embeddings, and design diffusion models to achieve generalized VRD in a conditional generative manner.
Our Diff-VRD is able to generate visual relations beyond the pre-defined category labels of datasets.
arXiv Detail & Related papers (2025-04-16T14:03:24Z) - A Modern Take on Visual Relationship Reasoning for Grasp Planning [10.543168383800532]
We present a modern take on visual relational reasoning for grasp planning.
We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories.
We also propose D3G, a new end-to-end transformer-based dependency graph generation model.
arXiv Detail & Related papers (2024-09-03T16:30:48Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Structured Sparse R-CNN for Direct Scene Graph Generation [16.646937866282922]
This paper presents a simple, sparse, and unified framework for relation detection, termed as Structured Sparse R-CNN.
The key to our method is a set of learnable triplet queries and structured triplet detectors which could be optimized jointly from the training set in an end-to-end manner.
We perform experiments on two benchmarks: Visual Genome and Open Images, and the results demonstrate that our method achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-06-21T02:24:20Z) - Relationship-based Neural Baby Talk [10.342180619706724]
We study three main relationships: textitspatial relationships to explore geometric interactions, textitsemantic relationships to extract semantic interactions, and textitimplicit relationships to capture hidden information.
Our proposed R-NBT model outperforms state-of-the-art models trained on COCO dataset in three image caption generation tasks.
arXiv Detail & Related papers (2021-03-08T15:51:24Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.