Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Detection
- URL: http://arxiv.org/abs/2402.11621v3
- Date: Sat, 15 Jun 2024 23:20:15 GMT
- Title: Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Detection
- Authors: Valeria Pastorino, Jasivan A. Sivakumar, Nafise Sadat Moosavi,
- Abstract summary: This paper presents a comprehensive analysis of GPT-4, GPT-3.5 Turbo, and FLAN-T5 models in detecting framing in news headlines.
We evaluated these models in various scenarios: zero-shot, few-shot with in-domain examples, cross-domain examples, and settings where models explain their predictions.
- Score: 10.301985230669684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous studies on framing have relied on manual analysis or fine-tuning models with limited annotated datasets. However, pre-trained models, with their diverse training backgrounds, offer a promising alternative. This paper presents a comprehensive analysis of GPT-4, GPT-3.5 Turbo, and FLAN-T5 models in detecting framing in news headlines. We evaluated these models in various scenarios: zero-shot, few-shot with in-domain examples, cross-domain examples, and settings where models explain their predictions. Our results show that explainable predictions lead to more reliable outcomes. GPT-4 performed exceptionally well in few-shot settings but often misinterpreted emotional language as framing, highlighting a significant challenge. Additionally, the results suggest that consistent predictions across multiple models could help identify potential annotation inaccuracies in datasets. Finally, we propose a new small dataset for real-world evaluation on headlines from a diverse set of topics.
Related papers
- Explaining Pre-Trained Language Models with Attribution Scores: An
Analysis in Low-Resource Settings [32.03184402316848]
We analyze attribution scores extracted from prompt-based models w.r.t. plausibility and faithfulness.
We find that using the prompting paradigm yields more plausible explanations than fine-tuning the models in low-resource settings.
arXiv Detail & Related papers (2024-03-08T14:14:37Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Black-Box Analysis: GPTs Across Time in Legal Textual Entailment Task [17.25356594832692]
We present an analysis of GPT-3.5 (ChatGPT) and GPT-4 performances on COLIEE Task 4 dataset.
Our preliminary experimental results unveil intriguing insights into the models' strengths and weaknesses in handling legal textual entailment tasks.
arXiv Detail & Related papers (2023-09-11T14:43:54Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Few-shot Text Classification with Dual Contrastive Consistency [31.141350717029358]
In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification.
We adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data.
arXiv Detail & Related papers (2022-09-29T19:26:23Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data.
We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets.
Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.