RL-CSDia: Representation Learning of Computer Science Diagrams
- URL: http://arxiv.org/abs/2103.05900v1
- Date: Wed, 10 Mar 2021 07:01:07 GMT
- Title: RL-CSDia: Representation Learning of Computer Science Diagrams
- Authors: Shaowei Wang, LingLing Zhang, Xuan Luo, Yi Yang, Xin Hu, and Jun Liu
- Abstract summary: We construct a novel dataset of graphic diagrams named Computer Science Diagrams (CSDia)
It contains more than 1,200 diagrams and exhaustive annotations of objects and relations.
Considering the visual noises caused by the various expressions in diagrams, we introduce the topology of diagrams to parse topological structure.
- Score: 25.66215925641988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies on computer vision mainly focus on natural images that express
real-world scenes. They achieve outstanding performance on diverse tasks such
as visual question answering. Diagram is a special form of visual expression
that frequently appears in the education field and is of great significance for
learners to understand multimodal knowledge. Current research on diagrams
preliminarily focuses on natural disciplines such as Biology and Geography,
whose expressions are still similar to natural images. Another type of diagrams
such as from Computer Science is composed of graphics containing complex
topologies and relations, and research on this type of diagrams is still blank.
The main challenges of graphic diagrams understanding are the rarity of data
and the confusion of semantics, which are mainly reflected in the diversity of
expressions. In this paper, we construct a novel dataset of graphic diagrams
named Computer Science Diagrams (CSDia). It contains more than 1,200 diagrams
and exhaustive annotations of objects and relations. Considering the visual
noises caused by the various expressions in diagrams, we introduce the topology
of diagrams to parse topological structure. After that, we propose Diagram
Parsing Net (DPN) to represent the diagram from three branches: topology,
visual feature, and text, and apply the model to the diagram classification
task to evaluate the ability of diagrams understanding. The results show the
effectiveness of the proposed DPN on diagrams understanding.
Related papers
- Do Vision-Language Models Really Understand Visual Language? [43.893398898373995]
Diagrams are a typical example of a visual language depicting complex concepts and their relationships in the form of an image.
Recent studies suggest that Large Vision-Language Models (LVLMs) can even tackle complex reasoning tasks involving diagrams.
This paper develops a comprehensive test suite to evaluate the diagram comprehension capability of LVLMs.
arXiv Detail & Related papers (2024-09-30T19:45:11Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Graph schemas as abstractions for transfer learning, inference, and
planning [5.565347203528707]
We propose graph schemas as a mechanism of abstraction for transfer learning.
Latent graph learning is emerging as a new computational model of the hippocampus.
By treating learned latent graphs as prior knowledge, new environments can be quickly learned.
arXiv Detail & Related papers (2023-02-14T21:23:22Z) - State of the Art and Potentialities of Graph-level Learning [54.68482109186052]
Graph-level learning has been applied to many tasks including comparison, regression, classification, and more.
Traditional approaches to learning a set of graphs rely on hand-crafted features, such as substructures.
Deep learning has helped graph-level learning adapt to the growing scale of graphs by extracting features automatically and encoding graphs into low-dimensional representations.
arXiv Detail & Related papers (2023-01-14T09:15:49Z) - Symbolic image detection using scene and knowledge graphs [39.49756199669471]
We use a scene graph, a graph representation of an image, to capture visual components.
We generate a knowledge graph using facts extracted from ConceptNet to reason about objects and attributes.
We extend the network further to use an attention mechanism which learn the importance of the graph on representations.
arXiv Detail & Related papers (2022-06-10T04:06:28Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Learning to Represent Image and Text with Denotation Graph [32.417311523031195]
We propose learning representations from a set of implied, visually grounded expressions between image and text.
We show that state-of-the-art multimodal learning models can be further improved by leveraging automatically harvested structural relations.
arXiv Detail & Related papers (2020-10-06T18:00:58Z) - A Heterogeneous Graph with Factual, Temporal and Logical Knowledge for
Question Answering Over Dynamic Contexts [81.4757750425247]
We study question answering over a dynamic textual environment.
We develop a graph neural network over the constructed graph, and train the model in an end-to-end manner.
arXiv Detail & Related papers (2020-04-25T04:53:54Z) - Graph-Structured Referring Expression Reasoning in The Wild [105.95488002374158]
Grounding referring expressions aims to locate in an image an object referred to by a natural language expression.
We propose a scene graph guided modular network (SGMN) to perform reasoning over a semantic graph and a scene graph.
We also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning.
arXiv Detail & Related papers (2020-04-19T11:00:30Z) - NODIS: Neural Ordinary Differential Scene Understanding [35.37702159888773]
It requires to detect all objects in an image, but also to identify all the relations between them.
The proposed architecture performs scene graph inference by solving a neural variant of an ODE by end-to-end learning.
It achieves state-of-the-art results on all three benchmark tasks: scene graph generation (SGGen), classification (SGCls) and visual relationship detection (PredCls) on Visual Genome benchmark.
arXiv Detail & Related papers (2020-01-14T12:17:18Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.