Evaluating and Enhancing Large Language Models for Conversational
Reasoning on Knowledge Graphs
- URL: http://arxiv.org/abs/2312.11282v2
- Date: Sun, 4 Feb 2024 03:45:04 GMT
- Title: Evaluating and Enhancing Large Language Models for Conversational
Reasoning on Knowledge Graphs
- Authors: Yuxuan Huang, Lida Shi, Anqi Liu and Hao Xu
- Abstract summary: We evaluate the conversational reasoning capabilities of the current state-of-the-art large language model (GPT-4) on knowledge graphs (KGs)
We introduce LLM-ARK, a grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths.
LLaMA-2-7B-ARK outperforms the current state-of-the-art model by 5.28 percentage points, with a performance rate of 36.39% on the target@1 evaluation metric.
- Score: 15.480976967871632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of large language models (LLMs) has been catalyzed by
advancements in pre-training techniques. These models have demonstrated robust
reasoning capabilities through manually designed prompts. In this work, we
evaluate the conversational reasoning capabilities of the current
state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the
performance of LLMs is constrained due to a lack of KG environment awareness
and the difficulties in developing effective optimization mechanisms for
intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG
reasoning agent designed to deliver precise and adaptable predictions on KG
paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate
state information within each reasoning step. We reframe the challenge of
multi-hop reasoning on the KG as a sequential decision-making task. Utilizing
the Proximal Policy Optimization (PPO) online policy gradient reinforcement
learning algorithm, our model is optimized to learn from rich reward signals.
Additionally, we conduct an evaluation of our model and GPT-4 on the OpenDialKG
dataset. The experimental results reveal that LLaMA-2-7B-ARK outperforms the
current state-of-the-art model by 5.28 percentage points, with a performance
rate of 36.39% on the target@1 evaluation metric. Meanwhile, GPT-4 scored
14.91%, further demonstrating the effectiveness of our method. Our code is
available on GitHub (https://github.com/Aipura/LLM-ARK) for further access.
Related papers
- OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models [68.17018458283651]
This work focuses on the offline evaluation of the chain-of-thought capabilities of LLMs.
We use knowledge graphs (e.g., Wikidata5m) to provide feedback on the generated chain of thoughts.
We show how to optimize LLMs based on the proposed evaluation method.
arXiv Detail & Related papers (2024-10-31T07:48:44Z) - Paths-over-Graph: Knowledge Graph Empowered Large Language Model Reasoning [19.442426875488675]
We propose Paths-over-Graph (PoG), a novel method that enhances Large Language Models (LLMs) reasoning by integrating knowledge reasoning paths from KGs.
PoG tackles multi-hop and multi-entity questions through a three-phase dynamic multi-hop path exploration.
In experiments, PoG with GPT-3.5-Turbo surpasses ToG with GPT-4 by up to 23.9%.
arXiv Detail & Related papers (2024-10-18T06:57:19Z) - Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering [35.2451096137883]
We introduce OKGQA, a new benchmark specifically designed to assess Large Language Models (LLMs) enhanced with Knowledge Graphs (KGs)
OKGQA is designed to closely reflect the complexities of practical applications using questions from different types, and incorporates specific metrics to measure both the reduction in hallucinations and the enhancement in reasoning capabilities.
We also propose OKGQA-P to assess model performance when the semantics and structure of KGs are deliberately perturbed and contaminated.
arXiv Detail & Related papers (2024-10-10T16:29:21Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning
over Knowledge Graph [134.8631016845467]
We propose an autonomous LLM-based agent framework, called KG-Agent.
In KG-Agent, we integrate the LLM, multifunctional toolbox, KG-based executor, and knowledge memory.
To guarantee the effectiveness, we leverage program language to formulate the multi-hop reasoning process over the KG.
arXiv Detail & Related papers (2024-02-17T02:07:49Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - GLoRE: Evaluating Logical Reasoning of Large Language Models [29.914546407784552]
We introduce GLoRE, a benchmark comprised of 12 datasets that span three different types of tasks.
ChatGPT and GPT-4 show a strong capability of logical reasoning, with GPT-4 surpassing ChatGPT by a large margin.
We propose a self-consistency probing method to enhance the accuracy of ChatGPT and a fine-tuned method to boost the performance of an open LLM.
arXiv Detail & Related papers (2023-10-13T13:52:15Z) - GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond [29.778018058541676]
GPT-Fathom is an open-source and reproducible evaluation suite for large language models (LLMs) built on top of OpenAI Evals.
We evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all aligned under settings.
arXiv Detail & Related papers (2023-09-28T16:43:35Z) - Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph [29.447300472617826]
Think-on-Graph (ToG) is a new approach for external knowledge graphs (KG) in large language models (LLMs)
ToG iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results.
ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.
arXiv Detail & Related papers (2023-07-15T03:31:38Z) - Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks [90.11273439036455]
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks.
We propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales from LLMs with augmented knowledge retrieved from an external knowledge base.
We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets.
arXiv Detail & Related papers (2023-05-28T13:00:00Z) - LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated.
We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.