Related papers: CodeKGC: Code Language Model for Generative Knowledge Graph Construction

CodeKGC: Code Language Model for Generative Knowledge Graph Construction

URL: http://arxiv.org/abs/2304.09048v2
Date: Thu, 18 Jan 2024 16:14:35 GMT
Title: CodeKGC: Code Language Model for Generative Knowledge Graph Construction
Authors: Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen, Ningyu Zhang
Abstract summary: Large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. We develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines.
Score: 46.220237225553234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuitively, we address the task of generative knowledge graph construction with code language model: given a code-format natural language input, the target is to generate triples which can be represented as code completion tasks. Specifically, we develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. As code inherently possesses structure, such as class and function definitions, it serves as a useful model for prior semantic structural knowledge. Furthermore, we employ a rationale-enhanced generation method to boost the performance. Rationales provide intermediate steps, thereby improving knowledge extraction abilities. Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines. Code and datasets are available in https://github.com/zjunlp/DeepKE/tree/main/example/llm.

Related papers

CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
Entity Identifier: A Natural Text Parsing-based Framework For Entity Relation Extraction [0.0]
We use natural language processing techniques to extract structured information from requirements descriptions. To facilitate this process, we introduce a pipeline for extracting entity and relation information. We also create a dataset to evaluate the effectiveness of our approach.
arXiv Detail & Related papers (2023-07-10T20:30:27Z)
Knowledge Graph Guided Semantic Evaluation of Language Models For User Trust [7.063958622970576]
This study evaluates the encoded semantics in the self-attention transformers by leveraging explicit knowledge graph structures. The opacity of language models has an immense bearing on societal issues of trust and explainable decision outcomes.
arXiv Detail & Related papers (2023-05-08T18:53:14Z)
Benchmarking Language Models for Code Syntax Understanding [79.11525961219591]
Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding. In this work, we perform the first thorough benchmarking of the state-of-the-art pre-trained models for identifying the syntactic structures of programs. Our findings point out key limitations of existing pre-training methods for programming languages, and suggest the importance of modeling code syntactic structures.
arXiv Detail & Related papers (2022-10-26T04:47:18Z)
Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction. RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z)
Joint Language Semantic and Structure Embedding for Knowledge Graph Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information. Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models. Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme. In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.