ThatiAR: Subjectivity Detection in Arabic News Sentences
- URL: http://arxiv.org/abs/2406.05559v1
- Date: Sat, 8 Jun 2024 19:24:17 GMT
- Title: ThatiAR: Subjectivity Detection in Arabic News Sentences
- Authors: Reem Suwaileh, Maram Hasanain, Fatema Hubail, Wajdi Zaghouani, Firoj Alam,
- Abstract summary: This study presents the first large dataset for subjectivity detection in Arabic.
It consists of 3.6K manually annotated sentences, and GPT-4o based explanation.
We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results.
- Score: 10.334164786614696
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Detecting subjectivity in news sentences is crucial for identifying media bias, enhancing credibility, and combating misinformation by flagging opinion-based content. It provides insights into public sentiment, empowers readers to make informed decisions, and encourages critical thinking. While research has developed methods and systems for this purpose, most efforts have focused on English and other high-resourced languages. In this study, we present the first large dataset for subjectivity detection in Arabic, consisting of ~3.6K manually annotated sentences, and GPT-4o based explanation. In addition, we included instructions (both in English and Arabic) to facilitate LLM based fine-tuning. We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results, including PLMs and LLMs. Our analysis of the annotation process highlights that annotators were strongly influenced by their political, cultural, and religious backgrounds, especially at the beginning of the annotation process. The experimental results suggest that LLMs with in-context learning provide better performance. We aim to release the dataset and resources for the community.
Related papers
- Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs [3.55026004901472]
We introduce an algorithm to estimate the context and style of the current session and use these estimations to generate a prompt that guides a Large Language Model (LLM) to generate high-quality translations.
Our method is both language and LLM-agnostic, making it a general-purpose tool.
arXiv Detail & Related papers (2024-12-29T11:33:51Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - VilBias: A Study of Bias Detection through Linguistic and Visual Cues , presenting Annotation Strategies, Evaluation, and Key Challenges [2.2751168722976587]
VLBias is a framework that leverages state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs) to detect linguistic and visual biases in news content.
We present a multimodal dataset comprising textual content and corresponding images from diverse news sources.
arXiv Detail & Related papers (2024-12-22T15:05:30Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Investigating Annotator Bias in Large Language Models for Hate Speech Detection [5.589665886212444]
This paper delves into the biases present in Large Language Models (LLMs) when annotating hate speech data.
Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases.
We introduce our custom hate speech detection dataset, HateBiasNet, to conduct this research.
arXiv Detail & Related papers (2024-06-17T00:18:31Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese
Media Bias Detection [16.343223974292908]
We build a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system.
In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information.
Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.
arXiv Detail & Related papers (2023-11-02T16:29:49Z) - Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning [10.897468059705238]
Supervised paraphrasers rely heavily on large quantities of labelled data to help preserve meaning and intent.
In this paper we aim to assist practitioners in developing usable paraphrasers by exploring In-Context Learning (ICL) with large language models (LLMs)
Our study focuses on key factors such as - number and order of demonstrations, exclusion of prompt instruction, and reduction in measured toxicity.
arXiv Detail & Related papers (2023-10-16T16:18:55Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts.
Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks.
We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.