Towards Reliable Misinformation Mitigation: Generalization, Uncertainty,
and GPT-4
- URL: http://arxiv.org/abs/2305.14928v3
- Date: Tue, 31 Oct 2023 07:19:01 GMT
- Title: Towards Reliable Misinformation Mitigation: Generalization, Uncertainty,
and GPT-4
- Authors: Kellin Pelrine, Anne Imouza, Camille Thibault, Meilina Reksoprodjo,
Caleb Gupta, Joel Christoph, Jean-Fran\c{c}ois Godbout, Reihaneh Rabbany
- Abstract summary: We show that GPT-4 can outperform prior methods in multiple settings and languages.
We propose techniques to handle uncertainty that can detect impossible examples and strongly improve outcomes.
This research lays the groundwork for future tools that can drive real-world progress to combat misinformation.
- Score: 5.313670352036673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Misinformation poses a critical societal challenge, and current approaches
have yet to produce an effective solution. We propose focusing on
generalization, uncertainty, and how to leverage recent large language models,
in order to create more practical tools to evaluate information veracity in
contexts where perfect classification is impossible. We first demonstrate that
GPT-4 can outperform prior methods in multiple settings and languages. Next, we
explore generalization, revealing that GPT-4 and RoBERTa-large exhibit
differences in failure modes. Third, we propose techniques to handle
uncertainty that can detect impossible examples and strongly improve outcomes.
We also discuss results on other language models, temperature, prompting,
versioning, explainability, and web retrieval, each one providing practical
insights and directions for future research. Finally, we publish the LIAR-New
dataset with novel paired English and French misinformation data and
Possibility labels that indicate if there is sufficient context for veracity
evaluation. Overall, this research lays the groundwork for future tools that
can drive real-world progress to combat misinformation.
Related papers
- Belief Revision: The Adaptability of Large Language Models Reasoning [63.0281286287648]
We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence.
Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning framework.
We evaluate $sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information.
arXiv Detail & Related papers (2024-06-28T09:09:36Z) - UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing [19.2764682793582]
We show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge.
We propose an adapter-based framework, UniArk, for generalised and consistent factual knowledge extraction.
arXiv Detail & Related papers (2024-04-01T17:22:07Z) - Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Detection [10.301985230669684]
This paper presents a comprehensive analysis of GPT-4, GPT-3.5 Turbo, and FLAN-T5 models in detecting framing in news headlines.
We evaluated these models in various scenarios: zero-shot, few-shot with in-domain examples, cross-domain examples, and settings where models explain their predictions.
arXiv Detail & Related papers (2024-02-18T15:27:48Z) - Comparing GPT-4 and Open-Source Language Models in Misinformation
Mitigation [6.929834518749884]
GPT-4 is known to be strong in this domain, but it is closed source, potentially expensive, and can show instability between different versions.
We show that Zephyr-7b presents a consistently viable alternative, overcoming key limitations of commonly used approaches.
We then highlight how GPT-3.5 exhibits unstable performance, such that this very widely used model could provide misleading results in misinformation detection.
arXiv Detail & Related papers (2024-01-12T22:27:25Z) - CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation [87.44350003888646]
Eval-Instruct can acquire pointwise grading critiques with pseudo references and revise these critiques via multi-path prompting.
CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines.
arXiv Detail & Related papers (2023-11-30T16:52:42Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality.
We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.