Related papers: Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China

Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China

URL: http://arxiv.org/abs/2312.09917v1
Date: Fri, 15 Dec 2023 16:25:56 GMT
Title: Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China
Authors: Di Zhou, Yinxian Zhang
Abstract summary: This study investigates political biases in GPT's multilingual models. We posed the same question about political issues in the U.S. and China to GPT in both English and simplified Chinese. Our analysis of the bilingual responses revealed that GPT's bilingual models' political "knowledge" (content) and the political "attitude" (sentiment) are significantly more inconsistent on political issues in China.
Score: 13.583047010078648
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rising popularity of ChatGPT and other AI-powered large language models (LLMs) has led to increasing studies highlighting their susceptibility to mistakes and biases. However, most of these studies focus on models trained on English texts. Taking an innovative approach, this study investigates political biases in GPT's multilingual models. We posed the same question about high-profile political issues in the United States and China to GPT in both English and simplified Chinese, and our analysis of the bilingual responses revealed that GPT's bilingual models' political "knowledge" (content) and the political "attitude" (sentiment) are significantly more inconsistent on political issues in China. The simplified Chinese GPT models not only tended to provide pro-China information but also presented the least negative sentiment towards China's problems, whereas the English GPT was significantly more negative towards China. This disparity may stem from Chinese state censorship and US-China geopolitical tensions, which influence the training corpora of GPT bilingual models. Moreover, both Chinese and English models tended to be less critical towards the issues of "their own" represented by the language used, than the issues of "the other." This suggests that GPT multilingual models could potentially develop a "political identity" and an associated sentiment bias based on their training language. We discussed the implications of our findings for information transmission and communication in an increasingly divided world.

Related papers

Bilingual Bias in Large Language Models: A Taiwan Sovereignty Benchmark Study [0.0]
Large Language Models (LLMs) are increasingly deployed in multilingual contexts, yet their consistency across languages on politically sensitive topics remains understudied.<n>This paper presents a systematic benchmark study examining how 17 LLMs respond to questions concerning the sovereignty of the Republic of China (Taiwan) when queried in Chinese versus English.<n>We discover significant language bias -- the phenomenon where the same model produces substantively different political stances depending on the query language.
arXiv Detail & Related papers (2026-02-06T03:57:21Z)
Are LLMs Good Safety Agents or a Propaganda Engine? [74.88607730071483]
PSP is a dataset built specifically to probe the refusal behaviors in Large Language Models from an explicitly political context.<n> PSP is built by formatting existing censored content from two data sources, openly available on the internet: sensitive prompts in China generalized to multiple countries, and tweets that have been censored in various countries.<n>We study: 1) impact of political sensitivity in seven LLMs through data-driven (making PSP implicit) and representation-level approaches (erasing the concept of politics); and, 2) vulnerability of models on PSP through prompt injection attacks (PIAs)
arXiv Detail & Related papers (2025-11-28T13:36:00Z)
Cross-Platform Short-Video Diplomacy: Topic and Sentiment Analysis of China-US Relations on Douyin and TikTok [53.79007551410356]
We examine discussions surrounding China-U.S. relations on the Chinese and American social media platforms textitDouyin and textitTikTok.<n>This study analyzed 4,040 videos and 338,209 user comments to assess the public discussions and sentiments on social media regarding China-U.S. relations.
arXiv Detail & Related papers (2025-10-25T19:28:58Z)
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models [52.00270888041742]
We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries.<n>Our findings show significant geopolitical biases, with models favoring specific national narratives.<n>Simple debiasing prompts had a limited effect on reducing these biases.
arXiv Detail & Related papers (2025-06-07T10:45:17Z)
Analysis of LLM Bias (Chinese Propaganda & Anti-US Sentiment) in DeepSeek-R1 vs. ChatGPT o3-mini-high [0.40329768057075643]
DeepSeek-R1 consistently exhibited substantially higher proportions of both propaganda and anti-U.S. sentiment.<n>These biases were not confined to overtly political topics but also permeated cultural and lifestyle content.
arXiv Detail & Related papers (2025-06-02T15:54:06Z)
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings [48.841514684592426]
We highlight the multimodal nature of Chinese language as a key challenge for deploying language models in toxic Chinese detection.<n>First, we propose a taxonomy of 3 perturbation strategies and 8 specific approaches in toxic Chinese content.<n>Then, we curate a dataset based on this taxonomy, and benchmark 9 SOTA LLMs (from both the US and China) to assess if they can detect perturbed toxic Chinese text.
arXiv Detail & Related papers (2025-05-30T08:32:45Z)
Language-Dependent Political Bias in AI: A Study of ChatGPT and Gemini [0.0]
This study investigates the political tendency of large language models and the existence of differentiation according to the query language. ChatGPT and Gemini were subjected to a political axis test using 14 different languages. A comparative analysis revealed that Gemini exhibited a more pronounced liberal and left-wing tendency compared to ChatGPT.
arXiv Detail & Related papers (2025-04-08T21:13:01Z)
Do Chinese models speak Chinese languages? [3.1815791977708834]
Language ability provides insights into pre-training data curation. China has a long history of explicit language policy, varying between inclusivity of minority languages and a Mandarin-first policy. We test performance of Chinese and Western open-source LLMs on Asian regional and Chinese minority languages.
arXiv Detail & Related papers (2025-03-31T23:19:08Z)
Mapping Geopolitical Bias in 11 Large Language Models: A Bilingual, Dual-Framing Analysis of U.S.-China Tensions [2.8202443616982884]
This study systematically analyzes geopolitical bias across 11 prominent Large Language Models (LLMs) We generated 19,712 prompts designed to detect ideological leanings in model outputs. U.S.-based models predominantly favored Pro-U.S. stances, while Chinese-origin models exhibited pronounced Pro-China biases.
arXiv Detail & Related papers (2025-03-31T03:38:17Z)
Echoes of Power: Investigating Geopolitical Bias in US and China Large Language Models [2.1028463367241033]
We investigate the geopolitical biases in US and Chinese Large Language Models (LLMs) Our findings show notable biases in both models, reflecting distinct ideological perspectives and cultural influences. This study highlights the potential of LLMs to shape public discourse and underscores the importance of critically assessing AI-generated content.
arXiv Detail & Related papers (2025-03-20T19:53:10Z)
Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language. We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Identifying the sources of ideological bias in GPT models through linguistic variation in output [0.0]
We use linguistic variation in countries with contrasting political attitudes to evaluate bias in GPT responses to sensitive political topics. We find GPT output is more conservative in languages that map well onto conservative societies. differences across languages observed in GPT-3.5 persist in GPT-4, even though GPT-4 is significantly more liberal due to OpenAI's filtering policy.
arXiv Detail & Related papers (2024-09-09T20:11:08Z)
Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z)
How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs [2.9123921488295768]
We evaluate six open-source multilingual LLMs pre-trained by Chinese companies on 18 languages. Our experiments show Chinese LLMs performance on diverse languages is indistinguishable from international LLMs. We find no sign of any consistent policy, either for or against, language diversity in China's LLM development.
arXiv Detail & Related papers (2024-07-12T19:21:40Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions. We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance [6.907734681124986]
This paper strategically identifies the need for linguistic equity by examining several knowledge editing techniques in multilingual contexts. We evaluate the performance of models such as Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama across languages including English, German, French, Italian, Spanish, Hindi, Tamil, and Kannada.
arXiv Detail & Related papers (2024-06-17T01:54:27Z)
Multilingual large language models leak human stereotypes across language boundaries [25.903732543380528]
We study how training a model multilingually may lead to stereotypes expressed in one language showing up in the models' behaviour in another. We propose a measurement framework for stereotype leakage and investigate its effect across English, Russian, Chinese, and Hindi. We find that GPT-3.5 exhibits the most stereotype leakage, and Hindi is the most susceptible to leakage effects.
arXiv Detail & Related papers (2023-12-12T10:24:17Z)
Expanding Scope: Adapting English Adversarial Attacks to Chinese [11.032727439758661]
This paper investigates how to adapt SOTA adversarial attack algorithms in English to the Chinese language. Our experiments show that attack methods previously applied to English NLP can generate high-quality adversarial examples in Chinese. In addition, we demonstrate that the generated adversarial examples can achieve high fluency and semantic consistency.
arXiv Detail & Related papers (2023-06-08T02:07:49Z)
COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences. We also propose textscCOLDetector to study output offensiveness of popular Chinese language models. Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z)
Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis. We cluster all the target languages into multiple groups and name each group as a representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
Country Image in COVID-19 Pandemic: A Case Study of China [79.17323278601869]
Country image has a profound influence on international relations and economic development. In the worldwide outbreak of COVID-19, countries and their people display different reactions. In this study, we take China as a specific and typical case and investigate its image with aspect-based sentiment analysis on a large-scale Twitter dataset.
arXiv Detail & Related papers (2020-09-12T15:54:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.