Trusted Source Alignment in Large Language Models
- URL: http://arxiv.org/abs/2311.06697v1
- Date: Sun, 12 Nov 2023 00:25:25 GMT
- Title: Trusted Source Alignment in Large Language Models
- Authors: Vasilisa Bashlovkina, Zhaobin Kuang, Riley Matthews, Edward Clifford,
Yennie Jun, William W. Cohen, Simon Baumgartner
- Abstract summary: We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles.
We find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.
- Score: 30.14375102262399
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are trained on web-scale corpora that inevitably
include contradictory factual information from sources of varying reliability.
In this paper, we propose measuring an LLM property called trusted source
alignment (TSA): the model's propensity to align with content produced by
trusted publishers in the face of uncertainty or controversy. We present
FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking
articles. We describe a simple protocol for evaluating TSA and offer a detailed
analysis of design considerations including response extraction, claim
contextualization, and bias in prompt formulation. Applying the protocol to
PaLM-2, we find that as we scale up the model size, the model performance on
FactCheckQA improves from near-random to up to 80% balanced accuracy in
aligning with trusted sources.
Related papers
- CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses.
We introduce CaLM, a novel verification framework.
Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - SPOT: Text Source Prediction from Originality Score Thresholding [6.790905400046194]
countermeasures aim at detecting misinformation, usually involve domain specific models trained to recognize the relevance of any information.
Instead of evaluating the validity of the information, we propose to investigate LLM generated text from the perspective of trust.
arXiv Detail & Related papers (2024-05-30T21:51:01Z) - Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees [5.348310708453905]
In radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making.
This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet an alignment criterion.
It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution.
arXiv Detail & Related papers (2024-05-16T17:55:24Z) - Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data [48.409306245463]
We develop models that quote verbatim statements from trusted sources in their pre-training data.
The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora.
Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models.
arXiv Detail & Related papers (2024-04-05T02:27:09Z) - Language Models with Conformal Factuality Guarantees [44.767328168194815]
Conformal factuality is a framework that can ensure high probability correctness guarantees for language model (LM) outputs.
We show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees.
arXiv Detail & Related papers (2024-02-15T18:31:53Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.