CodeKGC: Code Language Model for Generative Knowledge Graph Construction
- URL: http://arxiv.org/abs/2304.09048v2
- Date: Thu, 18 Jan 2024 16:14:35 GMT
- Title: CodeKGC: Code Language Model for Generative Knowledge Graph Construction
- Authors: Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen,
Ningyu Zhang
- Abstract summary: Large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks.
We develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph.
Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines.
- Score: 46.220237225553234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current generative knowledge graph construction approaches usually fail to
capture structural knowledge by simply flattening natural language into
serialized texts or a specification language. However, large generative
language model trained on structured data such as code has demonstrated
impressive capability in understanding natural language for structural
prediction and reasoning tasks. Intuitively, we address the task of generative
knowledge graph construction with code language model: given a code-format
natural language input, the target is to generate triples which can be
represented as code completion tasks. Specifically, we develop schema-aware
prompts that effectively utilize the semantic structure within the knowledge
graph. As code inherently possesses structure, such as class and function
definitions, it serves as a useful model for prior semantic structural
knowledge. Furthermore, we employ a rationale-enhanced generation method to
boost the performance. Rationales provide intermediate steps, thereby improving
knowledge extraction abilities. Experimental results indicate that the proposed
approach can obtain better performance on benchmark datasets compared with
baselines. Code and datasets are available in
https://github.com/zjunlp/DeepKE/tree/main/example/llm.
Related papers
- CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Entity Identifier: A Natural Text Parsing-based Framework For Entity
Relation Extraction [0.0]
We use natural language processing techniques to extract structured information from requirements descriptions.
To facilitate this process, we introduce a pipeline for extracting entity and relation information.
We also create a dataset to evaluate the effectiveness of our approach.
arXiv Detail & Related papers (2023-07-10T20:30:27Z) - Knowledge Graph Guided Semantic Evaluation of Language Models For User
Trust [7.063958622970576]
This study evaluates the encoded semantics in the self-attention transformers by leveraging explicit knowledge graph structures.
The opacity of language models has an immense bearing on societal issues of trust and explainable decision outcomes.
arXiv Detail & Related papers (2023-05-08T18:53:14Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Joint Language Semantic and Structure Embedding for Knowledge Graph
Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information.
Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models.
Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.