NODIS: Neural Ordinary Differential Scene Understanding
- URL: http://arxiv.org/abs/2001.04735v3
- Date: Sat, 18 Jul 2020 20:41:19 GMT
- Title: NODIS: Neural Ordinary Differential Scene Understanding
- Authors: Cong Yuren, Hanno Ackermann, Wentong Liao, Michael Ying Yang, and Bodo
Rosenhahn
- Abstract summary: It requires to detect all objects in an image, but also to identify all the relations between them.
The proposed architecture performs scene graph inference by solving a neural variant of an ODE by end-to-end learning.
It achieves state-of-the-art results on all three benchmark tasks: scene graph generation (SGGen), classification (SGCls) and visual relationship detection (PredCls) on Visual Genome benchmark.
- Score: 35.37702159888773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic image understanding is a challenging topic in computer vision. It
requires to detect all objects in an image, but also to identify all the
relations between them. Detected objects, their labels and the discovered
relations can be used to construct a scene graph which provides an abstract
semantic interpretation of an image. In previous works, relations were
identified by solving an assignment problem formulated as Mixed-Integer Linear
Programs. In this work, we interpret that formulation as Ordinary Differential
Equation (ODE). The proposed architecture performs scene graph inference by
solving a neural variant of an ODE by end-to-end learning. It achieves
state-of-the-art results on all three benchmark tasks: scene graph generation
(SGGen), classification (SGCls) and visual relationship detection (PredCls) on
Visual Genome benchmark.
Related papers
- Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - Scene Graph Generation with Geometric Context [12.074766935042586]
Scene graph, a visually grounded graphical structure of an image, immensely helps to simplify the image understanding tasks.
We introduce a post-processing algorithm called Geometric Context to understand the visual scenes better geometrically.
We exploit this context by calculating the direction and distance between object pairs.
arXiv Detail & Related papers (2021-11-25T15:42:21Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z) - Tackling the Challenges in Scene Graph Generation with Local-to-Global
Interactions [4.726777092009554]
We seek new insights into the underlying challenges of the Scene Graph Generation (SGG) task.
Motivated by the analysis, we design a novel SGG framework, Local-to-Global Interaction Networks (LOGIN)
Our framework enables predicting the scene graph in a local-to-global manner by design, leveraging the possible complementariness.
arXiv Detail & Related papers (2021-06-16T03:58:21Z) - RL-CSDia: Representation Learning of Computer Science Diagrams [25.66215925641988]
We construct a novel dataset of graphic diagrams named Computer Science Diagrams (CSDia)
It contains more than 1,200 diagrams and exhaustive annotations of objects and relations.
Considering the visual noises caused by the various expressions in diagrams, we introduce the topology of diagrams to parse topological structure.
arXiv Detail & Related papers (2021-03-10T07:01:07Z) - Learning Graph Embeddings for Compositional Zero-shot Learning [73.80007492964951]
In compositional zero-shot learning, the goal is to recognize unseen compositions of observed visual primitives states.
We propose a novel graph formulation called Compositional Graph Embedding (CGE) that learns image features and latent representations of visual primitives in an end-to-end manner.
By learning a joint compatibility that encodes semantics between concepts, our model allows for generalization to unseen compositions without relying on an external knowledge base like WordNet.
arXiv Detail & Related papers (2021-02-03T10:11:03Z) - Generative Compositional Augmentations for Scene Graph Prediction [27.535630110794855]
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language.
We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution.
We propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs.
arXiv Detail & Related papers (2020-07-11T12:11:53Z) - Graph-Structured Referring Expression Reasoning in The Wild [105.95488002374158]
Grounding referring expressions aims to locate in an image an object referred to by a natural language expression.
We propose a scene graph guided modular network (SGMN) to perform reasoning over a semantic graph and a scene graph.
We also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning.
arXiv Detail & Related papers (2020-04-19T11:00:30Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.