Benchmarking the Abilities of Large Language Models for RDF Knowledge
Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?
- URL: http://arxiv.org/abs/2309.17122v1
- Date: Fri, 29 Sep 2023 10:36:04 GMT
- Title: Benchmarking the Abilities of Large Language Models for RDF Knowledge
Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?
- Authors: Johannes Frey and Lars-Peter Meyer and Natanael Arndt and Felix Brei
and Kirill Bulert
- Abstract summary: Large Language Models (LLMs) are advancing at a rapid pace, with significant improvements at natural language processing and coding tasks.
To evaluate the proficiency of various LLMs, we created a set of five tasks that probe their ability to parse, understand, analyze, and create knowledge graphs serialized in Turtle syntax.
The evaluation encompassed four commercially available LLMs - GPT-3.5, GPT-4, Claude 1.3, and Claude 2.0, as well as two freely accessible offline models, GPT4All Vicuna and GPT4All Falcon 13B.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are advancing at a rapid pace, with significant
improvements at natural language processing and coding tasks. Yet, their
ability to work with formal languages representing data, specifically within
the realm of knowledge graph engineering, remains under-investigated. To
evaluate the proficiency of various LLMs, we created a set of five tasks that
probe their ability to parse, understand, analyze, and create knowledge graphs
serialized in Turtle syntax. These tasks, each embodying distinct degrees of
complexity and being able to scale with the size of the problem, have been
integrated into our automated evaluation system, the LLM-KG-Bench. The
evaluation encompassed four commercially available LLMs - GPT-3.5, GPT-4,
Claude 1.3, and Claude 2.0, as well as two freely accessible offline models,
GPT4All Vicuna and GPT4All Falcon 13B. This analysis offers an in-depth
understanding of the strengths and shortcomings of LLMs in relation to their
application within RDF knowledge graph engineering workflows utilizing Turtle
representation. While our findings show that the latest commercial models
outperform their forerunners in terms of proficiency with the Turtle language,
they also reveal an apparent weakness. These models fall short when it comes to
adhering strictly to the output formatting constraints, a crucial requirement
in this context.
Related papers
- GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding [20.12647254668254]
Recent code language models have scaled to billions of parameters, but model source code solely as text tokens.
In this work, we take the best of both worlds with GALLa - Graph Aligned Large Language Model.
arXiv Detail & Related papers (2024-09-06T10:57:34Z) - Source Code Summarization in the Era of Large Language Models [23.715005053430957]
Large language models (LLMs) have led to a great boost in the performance of code-related tasks.
In this paper, we undertake a systematic and comprehensive study on code summarization in the era of LLMs.
arXiv Detail & Related papers (2024-07-09T05:48:42Z) - TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks.
Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Evaluating Large Language Models on Graphs: Performance Insights and
Comparative Analysis [7.099257763803159]
We evaluate the capabilities of four Large Language Models (LLMs) in addressing several analytical problems with graph data.
We employ four distinct evaluation metrics: Correctness, Fidelity, and Rectification.
GPT models can generate logical and coherent results, outperforming alternatives in correctness.
arXiv Detail & Related papers (2023-08-22T06:32:07Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.