Validating Large Language Models with ReLM
- URL: http://arxiv.org/abs/2211.15458v2
- Date: Mon, 8 May 2023 18:01:28 GMT
- Title: Validating Large Language Models with ReLM
- Authors: Michael Kuchnik, Virginia Smith, George Amvrosiadis
- Abstract summary: Large language models (LLMs) have been touted for their ability to generate natural-sounding text.
There are growing concerns around possible negative effects of LLMs such as data memorization, bias, and inappropriate language.
We introduce ReLM, a system for validating and querying LLMs using standard regular expressions.
- Score: 11.552979853457117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although large language models (LLMs) have been touted for their ability to
generate natural-sounding text, there are growing concerns around possible
negative effects of LLMs such as data memorization, bias, and inappropriate
language. Unfortunately, the complexity and generation capacities of LLMs make
validating (and correcting) such concerns difficult. In this work, we introduce
ReLM, a system for validating and querying LLMs using standard regular
expressions. ReLM formalizes and enables a broad range of language model
evaluations, reducing complex evaluation rules to simple regular expression
queries. Our results exploring queries surrounding memorization, gender bias,
toxicity, and language understanding show that ReLM achieves up to 15x higher
system efficiency, 2.5x data efficiency, and increased statistical and
prompt-tuning coverage compared to state-of-the-art ad-hoc queries. ReLM offers
a competitive and general baseline for the increasingly important problem of
LLM validation.
Related papers
- Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - RAC: Efficient LLM Factuality Correction with Retrieval Augmentation [8.207682890286957]
Large Language Models (LLMs) exhibit impressive results across a wide range of natural language processing (NLP) tasks, yet they can often produce factually incorrect outputs.
This paper introduces a simple but effective low-latency post-correction method, textbfRetrieval Augmented Correction (RAC), aimed at enhancing the factual performance of LLMs without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-10-21T06:11:38Z) - Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models [59.970391602080205]
This study investigates whether such constraints on generation space impact LLMs abilities, including reasoning and domain knowledge comprehension.
We evaluate LLMs performance when restricted to adhere to structured formats versus generating free-form responses across various common tasks.
We find that stricter format constraints generally lead to greater performance degradation in reasoning tasks.
arXiv Detail & Related papers (2024-08-05T13:08:24Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline [42.61538071832468]
Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving.
We tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment.
arXiv Detail & Related papers (2024-04-03T17:51:18Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - Large Language Model Is Not a Good Few-shot Information Extractor, but a
Good Reranker for Hard Samples! [43.51393135075126]
Large Language Models (LLMs) have made remarkable strides in various tasks.
We show that current advanced LLMs consistently exhibit inferior performance, higher latency, and increased budget requirements compared to fine-tuned SLMs.
We propose an adaptive filter-then-rerank paradigm to combine the strengths of LLMs and SLMs.
arXiv Detail & Related papers (2023-03-15T12:20:13Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.