BERT-based knowledge extraction method of unstructured domain text
- URL: http://arxiv.org/abs/2103.00728v1
- Date: Mon, 1 Mar 2021 03:24:35 GMT
- Title: BERT-based knowledge extraction method of unstructured domain text
- Authors: Wang Zijia, Li Ye, Zhu Zhongkai
- Abstract summary: This paper proposes a knowledge extraction method based on BERT.
It converts the domain knowledge points into question and answer pairs and uses the text around the answer in documents as the context.
It is used to directly extract knowledge points from more insurance clauses.
- Score: 0.6445605125467573
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: With the development and business adoption of knowledge graph, there is an
increasing demand for extracting entities and relations of knowledge graphs
from unstructured domain documents. This makes the automatic knowledge
extraction for domain text quite meaningful. This paper proposes a knowledge
extraction method based on BERT, which is used to extract knowledge points from
unstructured specific domain texts (such as insurance clauses in the insurance
industry) automatically to save manpower of knowledge graph construction.
Different from the commonly used methods which are based on rules, templates or
entity extraction models, this paper converts the domain knowledge points into
question and answer pairs and uses the text around the answer in documents as
the context. The method adopts a BERT-based model similar to BERT's SQuAD
reading comprehension task. The model is fine-tuned. And it is used to directly
extract knowledge points from more insurance clauses. According to the test
results, the model performance is good.
Related papers
- Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - FabKG: A Knowledge graph of Manufacturing Science domain utilizing
structured and unconventional unstructured knowledge source [1.2597961235465307]
We develop knowledge graphs based upon entity and relation data for both commercial and educational uses.
We propose a novel crowdsourcing method for KG creation by leveraging student notes.
We have created a knowledge graph containing 65000+ triples using all data sources.
arXiv Detail & Related papers (2022-05-24T02:32:04Z) - TegTok: Augmenting Text Generation via Task-specific and Open-world
Knowledge [83.55215993730326]
We propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively.
arXiv Detail & Related papers (2022-03-16T10:37:59Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text.
Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z) - Knowledge Graph Anchored Information-Extraction for Domain-Specific
Insights [1.6308268213252761]
We use a task-based approach for fulfilling specific information needs within a new domain.
A pipeline constructed of state of the art NLP technologies is used to automatically extract an instance level semantic structure.
arXiv Detail & Related papers (2021-04-18T19:28:10Z) - KI-BERT: Infusing Knowledge Context for Better Language and Domain
Understanding [0.0]
We propose a technique to infuse knowledge context from knowledge graphs for conceptual and ambiguous entities into models based on transformer architecture.
Our novel technique project knowledge graph embedding in the homogeneous vector-space, introduces new token-types for entities, align entity position ids, and a selective attention mechanism.
We take BERT as a baseline model and implement "KnowledgeInfused BERT" by infusing knowledge context from ConceptNet and WordNet.
arXiv Detail & Related papers (2021-04-09T16:15:31Z) - BERTese: Learning to Speak to BERT [50.76152500085082]
We propose a method for automatically rewriting queries into "BERTese", a paraphrase query that is directly optimized towards better knowledge extraction.
We empirically show our approach outperforms competing baselines, obviating the need for complex pipelines.
arXiv Detail & Related papers (2021-03-09T10:17:22Z) - Understood in Translation, Transformers for Domain Understanding [2.379911867541422]
We propose a supervised machine learning method, based on Transformers, for domain definition of a corpus.
We argue why such automated definition of the domain's structure is beneficial both in terms of construction time and quality of the generated graph.
We present a new health domain dataset based on publications extracted from PubMed.
arXiv Detail & Related papers (2020-12-18T14:47:47Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.