A "Perspectival" Mirror of the Elephant: Investigating Language Bias on
Google, ChatGPT, YouTube, and Wikipedia
- URL: http://arxiv.org/abs/2303.16281v3
- Date: Fri, 8 Mar 2024 00:15:02 GMT
- Title: A "Perspectival" Mirror of the Elephant: Investigating Language Bias on
Google, ChatGPT, YouTube, and Wikipedia
- Authors: Queenie Luo, Michael J. Puett, Michael D. Smith
- Abstract summary: This paper presents evidence and analysis of language bias and discusses its larger social implications.
We find that Google and its most prominent returned results simply reflect a narrow set of culturally dominant views tied to the search language for complex topics like "Buddhism," "Liberalism," "colonization," "Iran" and "America"
Language bias sets a strong yet invisible cultural barrier online, where each language group thinks they can see other groups through searches, but in fact, what they see is their own reflection.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrary to Google Search's mission of delivering information from "many
angles so you can form your own understanding of the world," we find that
Google and its most prominent returned results - Wikipedia and YouTube - simply
reflect a narrow set of culturally dominant views tied to the search language
for complex topics like "Buddhism," "Liberalism," "colonization," "Iran" and
"America." Simply stated, they present, to varying degrees, distinct
information across the same search in different languages, a phenomenon we call
language bias. This paper presents evidence and analysis of language bias and
discusses its larger social implications. We find that our online searches and
emerging tools like ChatGPT turn us into the proverbial blind person touching a
small portion of an elephant, ignorant of the existence of other cultural
perspectives. Language bias sets a strong yet invisible cultural barrier
online, where each language group thinks they can see other groups through
searches, but in fact, what they see is their own reflection.
Related papers
- Comparing diversity, negativity, and stereotypes in Chinese-language AI technologies: a case study on Baidu, Ernie and Qwen [1.3354439722832292]
We study Chinese-based tools by investigating social biases embedded in the major Chinese search engine, Baidu.
We collect over 30k views encoded in the aforementioned tools by prompting them for candidate words describing such groups.
We find that language models exhibit a larger variety of embedded views compared to the search engine, although Baidu and Qwen generate negative content more often than Ernie.
arXiv Detail & Related papers (2024-08-28T10:51:18Z) - See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding [78.88461026069862]
Vision-language models (VLMs) can respond to queries about images in many languages.
We present a novel investigation that demonstrates and localizes Western bias in image understanding.
arXiv Detail & Related papers (2024-06-17T15:49:51Z) - A comparison of online search engine autocompletion in Google and Baidu [3.5016560416031886]
We study the characteristics of search auto-completions in two different linguistic and cultural contexts: Baidu and Google.
We find differences between the two search engines in the way they suppress or modify original queries.
Our study highlights the need for more refined, culturally sensitive moderation strategies in current language technologies.
arXiv Detail & Related papers (2024-05-03T08:17:04Z) - Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders.
In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies.
To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - Deception detection in text and its relation to the cultural dimension
of individualism/collectivism [6.17866386107486]
We investigate if differences in the usage of specific linguistic features of deception across cultures can be confirmed and attributed to norms in respect to the individualism/collectivism divide.
We create culture/language-aware classifiers by experimenting with a wide range of n-gram features based on phonology, morphology and syntax.
We conducted our experiments over 11 datasets from 5 languages i.e., English, Dutch, Russian, Spanish and Romanian, from six countries (US, Belgium, India, Russia, Mexico and Romania)
arXiv Detail & Related papers (2021-05-26T13:09:47Z) - Multilingual Contextual Affective Analysis of LGBT People Portrayals in
Wikipedia [34.183132688084534]
Specific lexical choices in narrative text reflect both the writer's attitudes towards people in the narrative and influence the audience's reactions.
We show how word connotations differ across languages and cultures, highlighting the difficulty of generalizing existing English datasets.
We then demonstrate the usefulness of our method by analyzing Wikipedia biography pages of members of the LGBT community across three languages.
arXiv Detail & Related papers (2020-10-21T08:27:36Z) - Visual Grounding in Video for Unsupervised Word Translation [91.47607488740647]
We use visual grounding to improve unsupervised word mapping between languages.
We learn embeddings from unpaired instructional videos narrated in the native language.
We apply these methods to translate words from English to French, Korean, and Japanese.
arXiv Detail & Related papers (2020-03-11T02:03:37Z) - A Framework for the Computational Linguistic Analysis of Dehumanization [52.735780962665814]
We analyze discussions of LGBTQ people in the New York Times from 1986 to 2015.
We find increasingly humanizing descriptions of LGBTQ people over time.
The ability to analyze dehumanizing language at a large scale has implications for automatically detecting and understanding media bias as well as abusive language online.
arXiv Detail & Related papers (2020-03-06T03:02:12Z) - Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube
Thumbnails of Popular Videos [98.87558262467257]
This study explores culture preferences among countries using the thumbnails of YouTube trending videos.
Experimental results indicate that the users from similar cultures shares interests in watching similar videos on YouTube.
arXiv Detail & Related papers (2020-01-27T20:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.