Developing a Scalable Benchmark for Assessing Large Language Models in
Knowledge Graph Engineering
- URL: http://arxiv.org/abs/2308.16622v1
- Date: Thu, 31 Aug 2023 10:31:19 GMT
- Title: Developing a Scalable Benchmark for Assessing Large Language Models in
Knowledge Graph Engineering
- Authors: Lars-Peter Meyer, Johannes Frey, Kurt Junghanns, Felix Brei, Kirill
Bulert, Sabine Gr\"under-Fahrer, Michael Martin
- Abstract summary: We introduce a benchmarking framework focused on knowledge graph engineering (KGE)
We show that while being a useful tool, Large Language Models are yet unfit to assist in knowledge graph generation with zero-shot prompting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the field of Large Language Models (LLMs) evolves at an accelerated pace,
the critical need to assess and monitor their performance emerges. We introduce
a benchmarking framework focused on knowledge graph engineering (KGE)
accompanied by three challenges addressing syntax and error correction, facts
extraction and dataset generation. We show that while being a useful tool, LLMs
are yet unfit to assist in knowledge graph generation with zero-shot prompting.
Consequently, our LLM-KG-Bench framework provides automatic evaluation and
storage of LLM responses as well as statistical data and visualization tools to
support tracking of prompt engineering and model performance.
Related papers
- GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks [15.147178364098034]
We present Graph Omni, a benchmark framework for evaluating graph reasoning capabilities of LLMs.
Our findings emphasize that no single serialization or prompting strategy consistently outperforms others.
Motivated by these insights, we propose a reinforcement learning-based approach that dynamically selects the best serialization-prompt pairings.
arXiv Detail & Related papers (2025-04-17T09:01:16Z) - FamilyTool: A Multi-hop Personalized Tool Use Benchmark [94.1158032740113]
We introduce FamilyTool, a novel benchmark grounded in a family-based knowledge graph (KG)
FamilyTool challenges Large Language Models with queries spanning 1 to 3 relational hops.
Experiments reveal significant performance gaps in state-of-the-art LLMs.
arXiv Detail & Related papers (2025-04-09T10:42:36Z) - Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks [2.362412515574206]
Large Language Models (LLMs) have proven immensely beneficial in education by capturing vast amounts of literature-based information.
We propose a generative AI-powered GATE question-answering framework that leverages LLMs to explain GATE solutions and support students in their exam preparation.
arXiv Detail & Related papers (2025-03-02T08:11:07Z) - Training Large Recommendation Models via Graph-Language Token Alignment [53.3142545812349]
We propose a novel framework to train Large Recommendation models via Graph-Language Token Alignment.
By aligning item and user nodes from the interaction graph with pretrained LLM tokens, GLTA effectively leverages the reasoning abilities of LLMs.
Furthermore, we introduce Graph-Language Logits Matching (GLLM) to optimize token alignment for end-to-end item prediction.
arXiv Detail & Related papers (2025-02-26T02:19:10Z) - Are Large Language Models In-Context Graph Learners? [31.172657860606297]
Large language models (LLMs) have remarkable in-context reasoning capabilities across a wide range of tasks.
However, they struggle to handle structured data, such as graphs, due to their lack of understanding of non-Euclidean structures.
We show that learning on graph data can be conceptualized as a retrieval-augmented generation (RAG) process.
We propose a series of RAG frameworks to enhance the in-context learning capabilities of LLMs for graph learning tasks.
arXiv Detail & Related papers (2025-02-19T09:14:19Z) - GraphICL: Unlocking Graph Learning Potential in LLMs through Structured Prompt Design [13.365623514253926]
Graph In-context Learning (GraphICL) Benchmark is a comprehensive benchmark comprising novel prompt templates to capture graph structure and handle limited label knowledge.
Our systematic evaluation shows that general-purpose LLMs equipped with our GraphICL outperform state-of-the-art specialized graph LLMs and graph neural network models.
arXiv Detail & Related papers (2025-01-27T03:50:30Z) - AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant [23.366991558162695]
Large Language Models generate factually incorrect information, known as "hallucination"
To cope with these challenges, we propose Assistant-based Retrieval-Augmented Generation (AssistRAG)
This assistant manages memory and knowledge through tool usage, action execution, memory building, and plan specification.
arXiv Detail & Related papers (2024-11-11T09:03:52Z) - How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension [53.6373473053431]
This work introduces a benchmark to assess large language models' capabilities in graph pattern tasks.
We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions.
Our benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models.
arXiv Detail & Related papers (2024-10-04T04:48:33Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations.
Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios.
We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z) - Learning on Graphs with Large Language Models(LLMs): A Deep Dive into Model Robustness [39.57155321515097]
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing tasks.
It remains unclear whether LLMs exhibit robustness in learning on graphs.
arXiv Detail & Related papers (2024-07-16T09:05:31Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [59.76268575344119]
We introduce a novel framework for enhancing large language models' (LLMs) planning capabilities by using planning data derived from knowledge graphs (KGs)
LLMs fine-tuned with KG data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval.
arXiv Detail & Related papers (2024-06-20T13:07:38Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Beyond Text: A Deep Dive into Large Language Models' Ability on
Understanding Graph Data [13.524529952170672]
Large language models (LLMs) have achieved impressive performance on many natural language processing tasks.
We aim to assess whether LLMs can effectively process graph data and leverage topological structures to enhance performance.
By comparing LLMs' performance with specialized graph models, we offer insights into the strengths and limitations of employing LLMs for graph analytics.
arXiv Detail & Related papers (2023-10-07T23:25:22Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - Exploring Large Language Model for Graph Data Understanding in Online
Job Recommendations [63.19448893196642]
We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs.
By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users.
arXiv Detail & Related papers (2023-07-10T11:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.