Understood in Translation, Transformers for Domain Understanding
- URL: http://arxiv.org/abs/2012.10271v1
- Date: Fri, 18 Dec 2020 14:47:47 GMT
- Title: Understood in Translation, Transformers for Domain Understanding
- Authors: Dimitrios Christofidellis, Matteo Manica, Leonidas Georgopoulos, Hans
Vandierendonck
- Abstract summary: We propose a supervised machine learning method, based on Transformers, for domain definition of a corpus.
We argue why such automated definition of the domain's structure is beneficial both in terms of construction time and quality of the generated graph.
We present a new health domain dataset based on publications extracted from PubMed.
- Score: 2.379911867541422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge acquisition is the essential first step of any Knowledge Graph (KG)
application. This knowledge can be extracted from a given corpus (KG generation
process) or specified from an existing KG (KG specification process). Focusing
on domain specific solutions, knowledge acquisition is a labor intensive task
usually orchestrated and supervised by subject matter experts. Specifically,
the domain of interest is usually manually defined and then the needed
generation or extraction tools are utilized to produce the KG. Herein, we
propose a supervised machine learning method, based on Transformers, for domain
definition of a corpus. We argue why such automated definition of the domain's
structure is beneficial both in terms of construction time and quality of the
generated graph. The proposed method is extensively validated on three public
datasets (WebNLG, NYT and DocRED) by comparing it with two reference methods
based on CNNs and RNNs models. The evaluation shows the efficiency of our model
in this task. Focusing on scientific document understanding, we present a new
health domain dataset based on publications extracted from PubMed and we
successfully utilize our method on this. Lastly, we demonstrate how this work
lays the foundation for fully automated and unsupervised KG generation.
Related papers
- Quality > Quantity: Synthetic Corpora from Foundation Models for
Closed-Domain Extractive Question Answering [35.38140071573828]
We study extractive question answering within closed domains and introduce the concept of targeted pre-training.
Our proposed framework uses Galactica to generate synthetic, targeted'' corpora that align with specific writing styles and topics.
arXiv Detail & Related papers (2023-10-25T20:48:16Z) - Text-To-KG Alignment: Comparing Current Methods on Classification Tasks [2.191505742658975]
knowledge graphs (KG) provide dense and structured representations of factual information.
Recent work has focused on creating pipeline models that retrieve information from KGs as additional context.
It is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query.
arXiv Detail & Related papers (2023-06-05T13:45:45Z) - Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language
Models [53.09723678623779]
We propose TAGREAL to automatically generate quality query prompts and retrieve support information from large text corpora.
The results show that TAGREAL achieves state-of-the-art performance on two benchmark datasets.
We find that TAGREAL has superb performance even with limited training data, outperforming existing embedding-based, graph-based, and PLM-based methods.
arXiv Detail & Related papers (2023-05-24T22:09:35Z) - Towards Ontology Reshaping for KG Generation with User-in-the-Loop:
Applied to Bosch Welding [18.83458273005337]
Knowledge graphs (KG) are used in a wide range of applications.
The automation of KG generation is very desired due to the data volume and variety in industries.
arXiv Detail & Related papers (2022-09-22T14:59:13Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - Knowledge Graph Anchored Information-Extraction for Domain-Specific
Insights [1.6308268213252761]
We use a task-based approach for fulfilling specific information needs within a new domain.
A pipeline constructed of state of the art NLP technologies is used to automatically extract an instance level semantic structure.
arXiv Detail & Related papers (2021-04-18T19:28:10Z) - BERT-based knowledge extraction method of unstructured domain text [0.6445605125467573]
This paper proposes a knowledge extraction method based on BERT.
It converts the domain knowledge points into question and answer pairs and uses the text around the answer in documents as the context.
It is used to directly extract knowledge points from more insurance clauses.
arXiv Detail & Related papers (2021-03-01T03:24:35Z) - KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation [100.79870384880333]
We propose a knowledge-grounded pre-training (KGPT) to generate knowledge-enriched text.
We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness.
Under zero-shot setting, our model achieves over 30 ROUGE-L on WebNLG while all other baselines fail.
arXiv Detail & Related papers (2020-10-05T19:59:05Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z) - Toward Subgraph-Guided Knowledge Graph Question Generation with Graph
Neural Networks [53.58077686470096]
Knowledge graph (KG) question generation (QG) aims to generate natural language questions from KGs and target answers.
In this work, we focus on a more realistic setting where we aim to generate questions from a KG subgraph and target answers.
arXiv Detail & Related papers (2020-04-13T15:43:22Z) - Domain Adaption for Knowledge Tracing [65.86619804954283]
We propose a novel adaptable framework, namely knowledge tracing (AKT) to address the DAKT problem.
For the first aspect, we incorporate the educational characteristics (e.g., slip, guess, question texts) based on the deep knowledge tracing (DKT) to obtain a good performed knowledge tracing model.
For the second aspect, we propose and adopt three domain adaptation processes. First, we pre-train an auto-encoder to select useful source instances for target model training.
arXiv Detail & Related papers (2020-01-14T15:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.