KnowGL: Knowledge Generation and Linking from Text
- URL: http://arxiv.org/abs/2210.13952v1
- Date: Tue, 25 Oct 2022 12:12:36 GMT
- Title: KnowGL: Knowledge Generation and Linking from Text
- Authors: Gaetano Rossiello, Faisal Chowdhury, Nandana Mihindukulasooriya, Owen
Cornec, Alfio Gliozzo
- Abstract summary: We propose KnowGL, a tool that allows converting text into structured relational data represented as a set of ABox assertions.
We address this problem as a sequence generation task by leveraging pre-trained sequence-to-sequence language models, e.g. BART.
To showcase the capabilities of our tool, we build a web application consisting of a set of UI widgets that help users to navigate through the semantic data extracted from a given input text.
- Score: 13.407149206621828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose KnowGL, a tool that allows converting text into structured
relational data represented as a set of ABox assertions compliant with the TBox
of a given Knowledge Graph (KG), such as Wikidata. We address this problem as a
sequence generation task by leveraging pre-trained sequence-to-sequence
language models, e.g. BART. Given a sentence, we fine-tune such models to
detect pairs of entity mentions and jointly generate a set of facts consisting
of the full set of semantic annotations for a KG, such as entity labels, entity
types, and their relationships. To showcase the capabilities of our tool, we
build a web application consisting of a set of UI widgets that help users to
navigate through the semantic data extracted from a given input text. We make
the KnowGL model available at https://huggingface.co/ibm/knowgl-large.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery [1.6080795642111267]
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline.
This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations.
A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology.
A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data.
arXiv Detail & Related papers (2024-05-16T13:17:14Z) - GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding [4.258365032282028]
We present a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs)
We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework.
Our results highlight the model's proficiency in identifying key-value relationships within the FUNSD dataset for forms and also discovering the spatial relationships in table-structured layouts for RVLCDIP business invoices.
arXiv Detail & Related papers (2024-05-06T01:40:20Z) - Exploiting Contextual Target Attributes for Target Sentiment
Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task.
We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z) - DocTr: Document Transformer for Structured Information Extraction in
Documents [36.1145541816468]
We present a new formulation for structured information extraction from visually rich documents.
It aims to address the limitations of existing IOB tagging or graph-based formulations.
We represent an entity as an anchor word and a bounding box, and represent entity linking as the association between anchor words.
arXiv Detail & Related papers (2023-07-16T02:59:30Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations
and Infographics using Large Language Models [0.6091702876917281]
We present LIDA, a novel tool for generating grammar-agnostic visualizations and infographics.
LIDA comprises of 4 modules - A SUMMARIZER that converts data into a rich but compact natural language summary, a GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that generates, refines and filters visualization code and an INFOGRAPHER module that yields data-faithful stylized graphics using IGMs.
arXiv Detail & Related papers (2023-03-06T06:47:22Z) - Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text.
Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.