EntGPT: Linking Generative Large Language Models with Knowledge Bases
- URL: http://arxiv.org/abs/2402.06738v1
- Date: Fri, 9 Feb 2024 19:16:27 GMT
- Title: EntGPT: Linking Generative Large Language Models with Knowledge Bases
- Authors: Yifan Ding, Amrit Poudel, Qingkai Zeng, Tim Weninger, Balaji
Veeramani, Sanmitra Bhattacharya
- Abstract summary: The ability of Large Language Models to generate factually correct output remains relatively unexplored.
We design a three-step hard-prompting method to probe LLMs' ED performance without supervised fine-tuning.
We further improve the knowledge grounding ability through instruction tuning (IT) with similar prompts and responses.
- Score: 9.067856411512427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability of Large Language Models (LLMs) to generate factually correct
output remains relatively unexplored due to the lack of fact-checking and
knowledge grounding during training and inference. In this work, we aim to
address this challenge through the Entity Disambiguation (ED) task. We first
consider prompt engineering, and design a three-step hard-prompting method to
probe LLMs' ED performance without supervised fine-tuning (SFT). Overall, the
prompting method improves the micro-F_1 score of the original vanilla models by
a large margin, on some cases up to 36% and higher, and obtains comparable
performance across 10 datasets when compared to existing methods with SFT. We
further improve the knowledge grounding ability through instruction tuning (IT)
with similar prompts and responses. The instruction-tuned model not only
achieves higher micro-F1 score performance as compared to several baseline
methods on supervised entity disambiguation tasks with an average micro-F_1
improvement of 2.1% over the existing baseline models, but also obtains higher
accuracy on six Question Answering (QA) tasks in the zero-shot setting. Our
methodologies apply to both open- and closed-source LLMs.
Related papers
- LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models [11.453585039783901]
LEAF: Learning and Evaluation Augmented by Fact-Checking, is a novel approach designed to enhance the factual reliability of large language models (LLMs)
The first strategy, Fact-Check-Then-RAG, improves Retrieval-Augmented Generation (RAG) by incorporating fact-checking results to guide the retrieval process without updating model parameters.
The second strategy, Learning from Fact-Checks via Self-Training, involves supervised fine-tuning (SFT) on fact-checked responses or applying Simple Preference Optimization (SimPO) with fact-checking as a ranking mechanism.
arXiv Detail & Related papers (2024-10-31T00:18:05Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Large Language Models aren't all that you need [0.0]
This paper describes the architecture and systems built towards solving the SemEval 2023 Task 2: MultiCoNER II.
We evaluate two approaches (a) a traditional Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a customized head and compare the two approaches.
arXiv Detail & Related papers (2024-01-01T08:32:50Z) - L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational
Language Models [15.726224465017596]
We propose an approach that focuses on extracting meaningful representations from unseen data and constructing a structured knowledge base.
We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE.
The proposed L3 ensemble method increases the model accuracy by 4% 36% compared to the fine-tuned FLM.
arXiv Detail & Related papers (2023-11-11T06:59:50Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.