Related papers: Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

URL: http://arxiv.org/abs/2302.12813v1
Date: Fri, 24 Feb 2023 18:48:43 GMT
Title: Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback
Authors: Baolin Peng and Michel Galley and Pengcheng He and Hao Cheng and Yujia Xie and Yu Hu and Qiuyuan Huang and Lars Liden and Zhou Yu and Weizhu Chen and Jianfeng Gao
Abstract summary: Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
Score: 127.75419038610455
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate hallucinations and inability to use external knowledge.This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules. Our system makes the LLM generate responses grounded in consolidated external knowledge, e.g., stored in task-specific databases. It also iteratively revises LLM prompts to improve model responses using feedback generated by utility functions, e.g., the factuality score of a LLM-generated response. The effectiveness of LLM-Augmenter is empirically validated on two types of mission-critical scenarios, task-oriented dialog and open-domain question answering. LLM-Augmenter significantly reduces ChatGPT's hallucinations without sacrificing the fluency and informativeness of its responses. We make the source code and models publicly available.

Related papers

LLM4VV: Evaluating Cutting-Edge LLMs for Generation and Evaluation of Directive-Based Parallel Programming Model Compiler Tests [7.6818904666624395]
This paper proposes a dual-LLM system and experiments with the usage of LLMs for the generation of compiler tests.<n>It is evident that LLMs possess the promising potential to generate quality compiler tests and verify them automatically.
arXiv Detail & Related papers (2025-07-29T02:34:28Z)
LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge [0.0]
Large Language Models (LLMs) have demonstrated the capability to generate human like, natural responses across a range of tasks. This paper introduces the LLM ENHANCER system, designed to integrate multiple online sources such as Google, Wikipedia, and DuckDuckGo to enhance data accuracy.
arXiv Detail & Related papers (2025-04-29T19:27:04Z)
Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG. In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning. In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z)
RuAG: Learned-rule-augmented Generation for Large Language Models [62.64389390179651]
We propose a novel framework, RuAG, to automatically distill large volumes of offline data into interpretable first-order logic rules. We evaluate our framework on public and private industrial tasks, including natural language processing, time-series, decision-making, and industrial tasks.
arXiv Detail & Related papers (2024-11-04T00:01:34Z)
Hidden in Plain Sight: Exploring Chat History Tampering in Interactive Language Models [12.920884182101142]
Large Language Models (LLMs) have become prevalent in real-world applications, exhibiting impressive text generation performance. To behave interactively, LLM-based chat systems must integrate prior chat history as context into their inputs, following a pre-defined structure. This paper introduces a systematic methodology to inject user-supplied history into LLM conversations without any prior knowledge of the target model.
arXiv Detail & Related papers (2024-05-30T16:36:47Z)
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models. It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop. Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge. We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types. We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination. Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z)
Investigating Answerability of LLMs for Long-Form Question Answering [35.41413072729483]
We focus on long-form question answering (LFQA) because it has several practical and impactful applications. We propose a question-generation method from abstractive summaries and show that generating follow-up questions from summaries of long documents can create a challenging setting.
arXiv Detail & Related papers (2023-09-15T07:22:56Z)
Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z)
Validating Large Language Models with ReLM [11.552979853457117]
Large language models (LLMs) have been touted for their ability to generate natural-sounding text. There are growing concerns around possible negative effects of LLMs such as data memorization, bias, and inappropriate language. We introduce ReLM, a system for validating and querying LLMs using standard regular expressions.
arXiv Detail & Related papers (2022-11-21T21:40:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.