Related papers: Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines

Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines

URL: http://arxiv.org/abs/2405.03153v1
Date: Mon, 6 May 2024 04:06:45 GMT
Title: Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines
Authors: Md Main Uddin Rony, Md Mahfuzul Haque, Mohammad Ali, Ahmed Shatil Alam, Naeemul Hassan,
Abstract summary: This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines. Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy.
Score: 2.0330684186105805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the digital age, the prevalence of misleading news headlines poses a significant challenge to information integrity, necessitating robust detection mechanisms. This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines. Utilizing a dataset of 60 articles, sourced from both reputable and questionable outlets across health, science & tech, and business domains, we employ three LLMs- ChatGPT-3.5, ChatGPT-4, and Gemini-for classification. Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy, especially in cases with unanimous annotator agreement on misleading headlines. The study emphasizes the importance of human-centered evaluation in developing LLMs that can navigate the complexities of misinformation detection, aligning technical proficiency with nuanced human judgment. Our findings contribute to the discourse on AI ethics, emphasizing the need for models that are not only technically advanced but also ethically aligned and sensitive to the subtleties of human interpretation.

Related papers

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.03333569013148]
We introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories. These types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives. Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans.
arXiv Detail & Related papers (2025-04-21T17:59:53Z)
Analyzing Feedback Mechanisms in AI-Generated MCQs: Insights into Readability, Lexical Properties, and Levels of Challenge [0.0]
This study delves into the linguistic and structural attributes of feedback generated by Google's Gemini 1.5-flash text model for computer science multiple-choice questions (MCQs)<n>Key linguistic metrics, such as length, readability scores (Flesch-Kincaid Grade Level), vocabulary richness, and lexical density, were computed and examined.<n>The findings reveal significant interaction effects between feedback tone and question difficulty, demonstrating the dynamic adaptation of AI-generated feedback within diverse educational contexts.
arXiv Detail & Related papers (2025-04-19T09:20:52Z)
Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content [0.24999074238880484]
We introduce a Large Language Model (LLM)-informed feature discovery framework to evaluate content credibility and explain its reasoning. We extract and quantify interpretable features using targeted prompts and integrate them into machine learning models to improve credibility predictions. Our method outperformed zero-shot GPT-based predictions by 13 percent in R2, and revealed key features like information concreteness and image format.
arXiv Detail & Related papers (2025-04-15T05:11:40Z)
Fact-checking with Generative AI: A Systematic Cross-Topic Examination of LLMs Capacity to Detect Veracity of Political Information [0.0]
The purpose of this study is to assess how large language models (LLMs) can be used for fact-checking. We use AI auditing methodology that systematically evaluates performance of five LLMs. The results indicate that models are better at identifying false statements, especially on sensitive topics.
arXiv Detail & Related papers (2025-03-11T13:06:40Z)
The Superalignment of Superhuman Intelligence with Large Language Models [63.96120398355404]
We discuss the concept of superalignment from the learning perspective to answer this question. We highlight some key research problems in superalignment, namely, weak-to-strong generalization, scalable oversight, and evaluation. We present a conceptual framework for superalignment, which consists of three modules: an attacker which generates adversary queries trying to expose the weaknesses of a learner model; a learner which will refine itself by learning from scalable feedbacks generated by a critic model along with minimal human experts; and a critic which generates critics or explanations for a given query-response pair, with a target of improving the learner by criticizing.
arXiv Detail & Related papers (2024-12-15T10:34:06Z)
A Survey on Automatic Credibility Assessment of Textual Credibility Signals in the Era of Large Language Models [6.538395325419292]
Credibility assessment is fundamentally based on aggregating credibility signals. Credibility signals provide a more granular, more easily explainable and widely utilizable information. A growing body of research on automatic credibility assessment and detection of credibility signals can be characterized as highly fragmented and lacking mutual interconnections.
arXiv Detail & Related papers (2024-10-28T17:51:08Z)
Belief in the Machine: Investigating Epistemological Blind Spots of Language Models [51.63547465454027]
Language models (LMs) are essential for reliable decision-making in fields like healthcare, law, and journalism. This study systematically evaluates the capabilities of modern LMs, including GPT-4, Claude-3, and Llama-3, using a new dataset, KaBLE. Our results reveal key limitations. First, while LMs achieve 86% accuracy on factual scenarios, their performance drops significantly with false scenarios. Second, LMs struggle with recognizing and affirming personal beliefs, especially when those beliefs contradict factual data.
arXiv Detail & Related papers (2024-10-28T16:38:20Z)
Learning to Generate and Evaluate Fact-checking Explanations with Transformers [10.970249299147866]
Research contributes to the field of Explainable Artificial Antelligence (XAI) We develop transformer-based fact-checking models that contextualise and justify their decisions by generating human-accessible explanations. We emphasise the need for aligning Artificial Intelligence (AI)-generated explanations with human judgements.
arXiv Detail & Related papers (2024-10-21T06:22:51Z)
Investigating Annotator Bias in Large Language Models for Hate Speech Detection [5.589665886212444]
This paper delves into the biases present in Large Language Models (LLMs) when annotating hate speech data. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. We introduce our custom hate speech detection dataset, HateBiasNet, to conduct this research.
arXiv Detail & Related papers (2024-06-17T00:18:31Z)
CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models [4.962252439662465]
We introduce the Curious About Uncertain Scene dataset to enable Large Language Models to emulate human cognitive processes for resolving uncertainties. Our approach involves providing scene descriptions embedded with uncertainties to stimulate the generation of reasoning and queries. Our results demonstrate that GPT-4 can effectively generate pertinent questions and grasp their nuances, particularly when given appropriate context and instructions.
arXiv Detail & Related papers (2024-04-18T01:31:19Z)
SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL) SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically. In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs. Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z)
A Critical Review of Large Language Models: Sensitivity, Bias, and the Path Toward Specialized AI [0.0]
This paper examines the comparative effectiveness of a specialized compiled language model and a general-purpose model like OpenAI's GPT-3.5 in detecting SDGs within text data. The study concludes by encouraging further research to find a balance between the capabilities of LLMs and the need for domain-specific expertise and interpretability.
arXiv Detail & Related papers (2023-07-28T09:20:22Z)
Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality. We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z)
Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.