Inference of Media Bias and Content Quality Using Natural-Language
Processing
- URL: http://arxiv.org/abs/2212.00237v1
- Date: Thu, 1 Dec 2022 03:04:55 GMT
- Title: Inference of Media Bias and Content Quality Using Natural-Language
Processing
- Authors: Zehan Chao, Denali Molitor, Deanna Needell, and Mason A. Porter
- Abstract summary: We present a framework to infer both political bias and content quality of media outlets from text.
We apply a bidirectional long short-term memory (LSTM) neural network to a data set of more than 1 million tweets.
Our results illustrate the importance of leveraging word order into machine-learning methods in text analysis.
- Score: 6.092956184948962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Media bias can significantly impact the formation and development of opinions
and sentiments in a population. It is thus important to study the emergence and
development of partisan media and political polarization. However, it is
challenging to quantitatively infer the ideological positions of media outlets.
In this paper, we present a quantitative framework to infer both political bias
and content quality of media outlets from text, and we illustrate this
framework with empirical experiments with real-world data. We apply a
bidirectional long short-term memory (LSTM) neural network to a data set of
more than 1 million tweets to generate a two-dimensional ideological-bias and
content-quality measurement for each tweet. We then infer a ``media-bias
chart'' of (bias, quality) coordinates for the media outlets by integrating the
(bias, quality) measurements of the tweets of the media outlets. We also apply
a variety of baseline machine-learning methods, such as a naive-Bayes method
and a support-vector machine (SVM), to infer the bias and quality values for
each tweet. All of these baseline approaches are based on a bag-of-words
approach. We find that the LSTM-network approach has the best performance of
the examined methods. Our results illustrate the importance of leveraging word
order into machine-learning methods in text analysis.
Related papers
- Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions [0.7249731529275342]
We propose an extension to a recently presented news media reliability estimation method.
We assess the classification performance of four reinforcement learning strategies on a large news media hyperlink graph.
Our experiments, targeting two challenging bias descriptors, factual reporting and political bias, showed a significant performance improvement at the source media level.
arXiv Detail & Related papers (2024-10-23T08:18:26Z) - Modeling Political Orientation of Social Media Posts: An Extended
Analysis [0.0]
Developing machine learning models to characterize political polarization on online social media presents significant challenges.
These challenges mainly stem from various factors such as the lack of annotated data, presence of noise in social media datasets, and the sheer volume of data.
We introduce two methods that leverage on news media bias and post content to label social media posts.
We demonstrate that current machine learning models can exhibit improved performance in predicting political orientation of social media posts.
arXiv Detail & Related papers (2023-11-21T03:34:20Z) - Introducing MBIB -- the first Media Bias Identification Benchmark Task
and Dataset Collection [24.35462897801079]
We introduce the Media Bias Identification Benchmark (MBIB) to group different types of media bias under a common framework.
After reviewing 115 datasets, we select nine tasks and carefully propose 22 associated datasets for evaluating media bias detection techniques.
Our results suggest that while hate speech, racial bias, and gender bias are easier to detect, models struggle to handle certain bias types, e.g., cognitive and political bias.
arXiv Detail & Related papers (2023-04-25T20:49:55Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - Bias or Diversity? Unraveling Fine-Grained Thematic Discrepancy in U.S.
News Headlines [63.52264764099532]
We use a large dataset of 1.8 million news headlines from major U.S. media outlets spanning from 2014 to 2022.
We quantify the fine-grained thematic discrepancy related to four prominent topics - domestic politics, economic issues, social issues, and foreign affairs.
Our findings indicate that on domestic politics and social issues, the discrepancy can be attributed to a certain degree of media bias.
arXiv Detail & Related papers (2023-03-28T03:31:37Z) - Computational Assessment of Hyperpartisanship in News Titles [55.92100606666497]
We first adopt a human-guided machine learning framework to develop a new dataset for hyperpartisan news title detection.
Overall the Right media tends to use proportionally more hyperpartisan titles.
We identify three major topics including foreign issues, political systems, and societal issues that are suggestive of hyperpartisanship in news titles.
arXiv Detail & Related papers (2023-01-16T05:56:58Z) - GREENER: Graph Neural Networks for News Media Profiling [24.675574340841163]
We study the problem of profiling news media on the Web with respect to their factuality of reporting and bias.
Our main focus is on modeling the similarity between media outlets based on the overlap of their audience.
Prediction accuracy is found to improve by 2.5-27 macro-F1 points for the two tasks.
arXiv Detail & Related papers (2022-11-10T12:46:29Z) - Cross-Domain Learning for Classifying Propaganda in Online Contents [67.10699378370752]
We present an approach to leverage cross-domain learning, based on labeled documents and sentences from news and tweets, as well as political speeches with a clear difference in their degrees of being propagandistic.
Our experiments demonstrate the usefulness of this approach, and identify difficulties and limitations in various configurations of sources and targets for the transfer step.
arXiv Detail & Related papers (2020-11-13T10:19:13Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Deep Learning Techniques for Future Intelligent Cross-Media Retrieval [58.20547387332133]
Cross-media retrieval plays a significant role in big data applications.
We provide a novel taxonomy according to the challenges faced by multi-modal deep learning approaches.
We present some well-known cross-media datasets used for retrieval.
arXiv Detail & Related papers (2020-07-21T09:49:33Z) - A multi-layer approach to disinformation detection on Twitter [4.663548775064491]
We employ a multi-layer representation of Twitter diffusion networks, and we compute for each layer a set of global network features.
Experimental results with two large-scale datasets, corresponding to diffusion cascades of news shared respectively in the United States and Italy, show that a simple Logistic Regression model is able to classify disinformation vs mainstream networks with high accuracy.
We believe that our network-based approach provides useful insights which pave the way to the future development of a system to detect misleading and harmful information spreading on social media.
arXiv Detail & Related papers (2020-02-28T09:25:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.