CLEVR Parser: A Graph Parser Library for Geometric Learning on Language
Grounded Image Scenes
- URL: http://arxiv.org/abs/2009.09154v2
- Date: Thu, 1 Oct 2020 22:56:35 GMT
- Title: CLEVR Parser: A Graph Parser Library for Geometric Learning on Language
Grounded Image Scenes
- Authors: Raeid Saqur and Ameet Deshpande
- Abstract summary: CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains.
We present a graph library for CLEVR that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities.
We discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.
- Score: 2.750124853532831
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The CLEVR dataset has been used extensively in language grounded visual
reasoning in Machine Learning (ML) and Natural Language Processing (NLP)
domains. We present a graph parser library for CLEVR, that provides
functionalities for object-centric attributes and relationships extraction, and
construction of structural graph representations for dual modalities.
Structural order-invariant representations enable geometric learning and can
aid in downstream tasks like language grounding to vision, robotics,
compositionality, interpretability, and computational grammar construction. We
provide three extensible main components - parser, embedder, and visualizer
that can be tailored to suit specific learning setups. We also provide
out-of-the-box functionality for seamless integration with popular deep graph
neural network (GNN) libraries. Additionally, we discuss downstream usage and
applications of the library, and how it accelerates research for the NLP
research community.
Related papers
- Language is All a Graph Needs [33.9836278881785]
We propose InstructGLM (Instruction-finetuned Graph Language Model) with highly scalable prompts based on natural language instructions.
Our method surpasses all GNN baselines on ogbn-arxiv, Cora and PubMed datasets.
arXiv Detail & Related papers (2023-08-14T13:41:09Z) - LAVIS: A Library for Language-Vision Intelligence [98.88477610704938]
LAVIS is an open-source library for LAnguage-VISion research and applications.
It features a unified interface to easily access state-of-the-art image-language, video-language models and common datasets.
arXiv Detail & Related papers (2022-09-15T18:04:10Z) - Video-Text Pre-training with Learned Regions [59.30893505895156]
Video-Text pre-training aims at learning transferable representations from large-scale video-text pairs.
We propose a module for videotext-learning, RegionLearner, which can take into account the structure of objects during pre-training on large-scale video-text pairs.
arXiv Detail & Related papers (2021-12-02T13:06:53Z) - DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in
Deep Learning [12.122347427933637]
We demonstrate a library for the integration of domain knowledge in deep learning architectures.
Using this library, the structure of the data is expressed symbolically via graph declarations.
The domain knowledge can be defined explicitly, which improves the models' explainability.
arXiv Detail & Related papers (2021-08-27T16:06:42Z) - Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis.
When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Captum: A unified and generic model interpretability library for PyTorch [49.72749684393332]
We introduce a novel, unified, open-source model interpretability library for PyTorch.
The library contains generic implementations of a number of gradient and perturbation-based attribution algorithms.
It can be used for both classification and non-classification models.
arXiv Detail & Related papers (2020-09-16T18:57:57Z) - Object Relational Graph with Teacher-Recommended Learning for Video
Captioning [92.48299156867664]
We propose a complete video captioning system including both a novel model and an effective training strategy.
Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation.
Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.
arXiv Detail & Related papers (2020-02-26T15:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.