Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
- URL: http://arxiv.org/abs/2406.20079v1
- Date: Fri, 28 Jun 2024 17:43:48 GMT
- Title: Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification
- Authors: Anisha Gunjal, Greg Durrett,
- Abstract summary: We argue that fully atomic facts are not the right representation, and define two criteria for molecular facts: decontextuality, or how well they can stand alone, and minimality.
We present a baseline methodology for generating molecular facts automatically, aiming to add the right amount of information.
- Score: 56.39904484784127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic factuality verification of large language model (LLM) generations is becoming more and more widely used to combat hallucinations. A major point of tension in the literature is the granularity of this fact-checking: larger chunks of text are hard to fact-check, but more atomic facts like propositions may lack context to interpret correctly. In this work, we assess the role of context in these atomic facts. We argue that fully atomic facts are not the right representation, and define two criteria for molecular facts: decontextuality, or how well they can stand alone, and minimality, or how little extra information is added to achieve decontexuality. We quantify the impact of decontextualization on minimality, then present a baseline methodology for generating molecular facts automatically, aiming to add the right amount of information. We compare against various methods of decontextualization and find that molecular facts balance minimality with fact verification accuracy in ambiguous settings.
Related papers
- FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs [0.0]
We present FactGenius, a novel method that enhances fact-checking by combining zero-shot prompting of large language models with fuzzy text matching on knowledge graphs.
The evaluation of FactGenius on the FactKG, a benchmark dataset for fact verification, demonstrates that it significantly outperforms existing baselines.
arXiv Detail & Related papers (2024-06-03T13:24:37Z) - Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation [42.08917809689811]
We propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text.
In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average.
In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task.
arXiv Detail & Related papers (2024-04-23T12:35:44Z) - Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts [31.769428095250912]
Large Language Models (LLMs) are easily misled by untruthful contexts provided by users or knowledge augmentation tools.
We propose Truth-Aware Context Selection (TACS) to adaptively recognize and mask untruthful context from the inputs.
We show that TACS can effectively filter untruthful context and significantly improve the overall quality of LLMs' responses when presented with misleading information.
arXiv Detail & Related papers (2024-03-12T11:40:44Z) - Linking Surface Facts to Large-Scale Knowledge Graphs [23.380979397966286]
Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples.
Knowledge Graphs (KGs) contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema.
We propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level.
arXiv Detail & Related papers (2023-10-23T13:18:49Z) - The Perils & Promises of Fact-checking with Large Language Models [55.869584426820715]
Large Language Models (LLMs) are increasingly trusted to write academic papers, lawsuits, and news articles.
We evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions.
Our results show the enhanced prowess of LLMs when equipped with contextual information.
While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy.
arXiv Detail & Related papers (2023-10-20T14:49:47Z) - MolXPT: Wrapping Molecules with Text for Generative Pre-training [141.0924452870112]
MolXPT is a unified language model of text and molecules pre-trained on SMILES wrapped by text.
MolXPT outperforms strong baselines of molecular property prediction on MoleculeNet.
arXiv Detail & Related papers (2023-05-18T03:58:19Z) - Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for
Misinformation [67.69725605939315]
Misinformation emerges in times of uncertainty when credible information is limited.
This is challenging for NLP-based fact-checking as it relies on counter-evidence, which may not yet be available.
arXiv Detail & Related papers (2022-10-25T09:40:48Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Factuality Enhanced Language Models for Open-Ended Text Generation [60.27166549575472]
We design the FactualityPrompts test set and metrics to measure the factuality of LM generations.
We find that larger LMs are more factual than smaller ones, although a previous study suggests that larger LMs can be less truthful in terms of misconceptions.
We propose a factuality-enhanced training method that uses TopicPrefix for better awareness of facts and sentence completion.
arXiv Detail & Related papers (2022-06-09T17:16:43Z) - The Role of Context in Detecting Previously Fact-Checked Claims [27.076320857009655]
We focus on claims made in a political debate, where context really matters.
We study the impact of modeling the context of the claim both on the source side, as well as on the target side, in the fact-checking explanation document.
arXiv Detail & Related papers (2021-04-15T12:39:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.