Bridging Nations: Quantifying the Role of Multilinguals in Communication
on Social Media
- URL: http://arxiv.org/abs/2304.03797v1
- Date: Fri, 7 Apr 2023 18:01:25 GMT
- Title: Bridging Nations: Quantifying the Role of Multilinguals in Communication
on Social Media
- Authors: Julia Mendelsohn, Sayan Ghosh, David Jurgens, Ceren Budak
- Abstract summary: We quantify multilingual users' structural role and communication influence in cross-lingual information exchange.
Having a multilingual network neighbor increases monolinguals' odds of sharing domains and hashtags from another language 16-fold and 4-fold, respectively.
By highlighting information exchange across borders, this work sheds light on a crucial component of how information and ideas spread around the world.
- Score: 14.646734380673648
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media enables the rapid spread of many kinds of information, from
memes to social movements. However, little is known about how information
crosses linguistic boundaries. We apply causal inference techniques on the
European Twitter network to quantify multilingual users' structural role and
communication influence in cross-lingual information exchange. Overall,
multilinguals play an essential role; posting in multiple languages increases
betweenness centrality by 13%, and having a multilingual network neighbor
increases monolinguals' odds of sharing domains and hashtags from another
language 16-fold and 4-fold, respectively. We further show that multilinguals
have a greater impact on diffusing information less accessible to their
monolingual compatriots, such as information from far-away countries and
content about regional politics, nascent social movements, and job
opportunities. By highlighting information exchange across borders, this work
sheds light on a crucial component of how information and ideas spread around
the world.
Related papers
- Socially Responsible Data for Large Multilingual Language Models [12.338723881042926]
Large Language Models (LLMs) have rapidly increased in size and apparent capabilities in the last three years.
Various efforts are striving for models to accommodate languages of communities outside of the Global North.
arXiv Detail & Related papers (2024-09-08T23:51:04Z) - The Geography of Information Diffusion in Online Discourse on Europe and
Migration [4.590533239391236]
We analyse the information circulating online about Europe and migration after retrieving a large amount of data from social media (Twitter)
We combine retweets and hashtags network analysis with geolocation of users.
Results show how the majority of online discussions occurs at a national level, especially when discussing migration.
arXiv Detail & Related papers (2024-02-21T13:30:34Z) - Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - Lost in Translation -- Multilingual Misinformation and its Evolution [52.07628580627591]
This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages.
We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times.
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
arXiv Detail & Related papers (2023-10-27T12:21:55Z) - Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP.
We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba.
Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region.
All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - Human Languages with Greater Information Density Increase Communication
Speed, but Decrease Conversation Breadth [0.0]
We show that there is broad variation in how densely languages encode information into their words.
Second, we show that this language information density is associated with a denser configuration of semantic information.
Finally, we trace the relationship between language information density and patterns of communication.
arXiv Detail & Related papers (2021-12-15T21:35:56Z) - Challenges and Considerations with Code-Mixed NLP for Multilingual
Societies [1.6675267471157407]
This paper discusses the current state of the NLP research, limitations, and foreseeable pitfalls in addressing five real-world applications for social good.
We also propose futuristic datasets, models, and tools that can significantly advance the current research in multilingual NLP applications for the societal good.
arXiv Detail & Related papers (2021-06-15T00:53:55Z) - Capturing the diversity of multilingual societies [0.0]
We consider the processes at work in language shift through a conjunction of theoretical and data-driven perspectives.
A large-scale empirical study of spatial patterns of languages in multilingual societies using Twitter and census data yields a wide diversity.
We propose a model in which coexistence of languages may be reached when learning the other language is facilitated and when bilinguals favor the use of the endangered language.
arXiv Detail & Related papers (2021-05-06T10:27:43Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.