An LLM Maturity Model for Reliable and Transparent Text-to-Query
- URL: http://arxiv.org/abs/2402.14855v1
- Date: Tue, 20 Feb 2024 06:20:09 GMT
- Title: An LLM Maturity Model for Reliable and Transparent Text-to-Query
- Authors: Lei Yu (Expression) and Abir Ray (Expression)
- Abstract summary: This work proposes an LLM maturity model tailored for text-to-query applications.
This maturity model seeks to fill the existing void in evaluating LLMs in such applications by incorporating dimensions beyond mere correctness or accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recognizing the imperative to address the reliability and transparency issues
of Large Language Models (LLM), this work proposes an LLM maturity model
tailored for text-to-query applications. This maturity model seeks to fill the
existing void in evaluating LLMs in such applications by incorporating
dimensions beyond mere correctness or accuracy. Moreover, this work introduces
a real-world use case from the law enforcement domain and showcases QueryIQ, an
LLM-powered, domain-specific text-to-query assistant to expedite user workflows
and reveal hidden relationship in data.
Related papers
- Claim Verification in the Age of Large Language Models: A Survey [37.32036088774565]
We present a comprehensive account of recent claim verification frameworks using Large Language Models (LLMs)
We describe the different components of the claim verification pipeline used in these frameworks in detail.
arXiv Detail & Related papers (2024-08-26T14:45:03Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering [9.86691461253151]
We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of large language models (LLMs)
Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers.
We present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup.
arXiv Detail & Related papers (2024-05-28T09:12:44Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach [8.646131951484696]
AuditLLM is a novel tool designed to audit the performance of various Large Language Models (LLMs) in a methodical way.
A robust, reliable, and consistent LLM is expected to generate semantically similar responses to variably phrased versions of the same question.
A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues.
arXiv Detail & Related papers (2024-02-14T17:31:04Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks.
How do we evaluate the capabilities of LLMs to consistently produce factually correct answers?
We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.