A Quantitative Discourse Analysis of Asian Workers in the US Historical
Newspapers
- URL: http://arxiv.org/abs/2402.02572v1
- Date: Sun, 4 Feb 2024 17:32:52 GMT
- Title: A Quantitative Discourse Analysis of Asian Workers in the US Historical
Newspapers
- Authors: Jaihyun Park, Ryan Cordell
- Abstract summary: We present computational text analysis on how Asian workers are represented in historical newspapers in the United States.
We found that the word "coolie" was semantically different in some States with the different discourses around coolie.
We also found that then-Confederate newspapers and then-Union newspapers formed distinctive discourses by measuring over-represented words.
- Score: 4.8002841809407695
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Warning: This paper contains examples of offensive language targetting
marginalized population. The digitization of historical texts invites
researchers to explore the large-scale corpus of historical texts with
computational methods. In this study, we present computational text analysis on
a relatively understudied topic of how Asian workers are represented in
historical newspapers in the United States. We found that the word "coolie" was
semantically different in some States (e.g., Massachusetts, Rhode Island,
Wyoming, Oklahoma, and Arkansas) with the different discourses around coolie.
We also found that then-Confederate newspapers and then-Union newspapers formed
distinctive discourses by measuring over-represented words. Newspapers from
then-Confederate States associated coolie with slavery-related words. In
addition, we found Asians were perceived to be inferior to European immigrants
and subjected to the target of racism. This study contributes to supplementing
the qualitative analysis of racism in the United States with quantitative
discourse analysis.
Related papers
- A Data-driven Investigation of Euphemistic Language: Comparing the usage of "slave" and "servant" in 19th century US newspapers [4.063328359314906]
This study investigates the usage of "slave" and "servant" in the 19th century US newspapers using computational methods.
We found that "slave" is associated with socio-economic, legal, and administrative words.
"servant" is linked to religious words in the Northern newspapers while Southern newspapers associated "servant" with domestic and familial words.
arXiv Detail & Related papers (2025-03-19T09:49:22Z) - A Longitudinal Analysis of Racial and Gender Bias in New York Times and Fox News Images and Articles [2.482116411483087]
We use a dataset of 123,337 images and 441,321 online news articles from New York Times (NYT) and Fox News (Fox)
We examine the frequency and prominence of appearance of racial and gender groups in images embedded in news articles.
We find that NYT largely features more images of racial minority groups compared to Fox.
arXiv Detail & Related papers (2024-10-29T09:42:54Z) - Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level.
We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis [44.17106903728264]
Most hate speech datasets neglect the cultural diversity within a single language.
To address this, we introduce CREHate, a CRoss-cultural English Hate speech dataset.
Only 56.2% of the posts in CREHate achieve consensus among all countries, with the highest pairwise label difference rate of 26%.
arXiv Detail & Related papers (2023-08-31T13:14:47Z) - Regional Negative Bias in Word Embeddings Predicts Racial Animus--but
only via Name Frequency [2.247786323899963]
We show that anti-black WEAT estimates from geo-tagged social media data strongly correlate with several measures of racial animus.
We also show that every one of these correlations is explained by the frequency of Black names in the underlying corpora relative to White names.
arXiv Detail & Related papers (2022-01-20T20:52:12Z) - "Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During
the COVID-19 Pandemic [2.5227595609842206]
COVID-19 pandemic has fueled a surge in anti-Asian xenophobia and prejudice.
We create and annotate a corpus of Twitter tweets using 2 experimental approaches to explore anti-Asian abusive and hate speech.
arXiv Detail & Related papers (2021-12-04T06:55:19Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - The 'Letter' Distribution in the Chinese Language [24.507787098011907]
Studies have found that letters in some alphabetic writing languages have strikingly similar statistical usage frequency distributions.
This study provides new evidence of the consistency of human languages.
arXiv Detail & Related papers (2020-05-26T05:18:56Z) - Multilingual Twitter Corpus and Baselines for Evaluating Demographic
Bias in Hate Speech Recognition [46.57105755981092]
We publish a multilingual Twitter corpus for the task of hate speech detection.
The corpus covers five languages: English, Italian, Polish, Portuguese and Spanish.
We evaluate the inferred demographic labels with a crowdsourcing platform.
arXiv Detail & Related papers (2020-02-24T16:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.