When Dialects Collide: How Socioeconomic Mixing Affects Language Use
- URL: http://arxiv.org/abs/2307.10016v1
- Date: Wed, 19 Jul 2023 14:55:50 GMT
- Title: When Dialects Collide: How Socioeconomic Mixing Affects Language Use
- Authors: Thomas Louf, Jos\'e J. Ramasco, David S\'anchez, M\'arton Karsai
- Abstract summary: We find that the more different socioeconomic classes mix, the less interdependent the frequency of their departures from standard grammar and their income become.
We propose an agent-based model of linguistic variety adoption that sheds light on the mechanisms that produce the observations seen in the data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The socioeconomic background of people and how they use standard forms of
language are not independent, as demonstrated in various sociolinguistic
studies. However, the extent to which these correlations may be influenced by
the mixing of people from different socioeconomic classes remains relatively
unexplored from a quantitative perspective. In this work we leverage geotagged
tweets and transferable computational methods to map deviations from standard
English on a large scale, in seven thousand administrative areas of England and
Wales. We combine these data with high-resolution income maps to assign a proxy
socioeconomic indicator to home-located users. Strikingly, across eight
metropolitan areas we find a consistent pattern suggesting that the more
different socioeconomic classes mix, the less interdependent the frequency of
their departures from standard grammar and their income become. Further, we
propose an agent-based model of linguistic variety adoption that sheds light on
the mechanisms that produce the observations seen in the data.
Related papers
- ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models [75.05436691700572]
We introduce ExpliCa, a new dataset for evaluating Large Language Models (LLMs) in explicit causal reasoning.
We tested seven commercial and open-source LLMs on ExpliCa through prompting and perplexity-based metrics.
Surprisingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events.
arXiv Detail & Related papers (2025-02-21T14:23:14Z) - On the Proper Treatment of Tokenization in Psycholinguistics [53.960910019072436]
The paper argues that token-level language models should be marginalized into character-level language models before they are used in psycholinguistic studies.
We find various focal areas whose surprisal is a better psychometric predictor than the surprisal of the region of interest itself.
arXiv Detail & Related papers (2024-10-03T17:18:03Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Understanding Intrinsic Socioeconomic Biases in Large Language Models [4.276697874428501]
We introduce a novel dataset of one million English sentences to quantify socioeconomic biases.
Our findings reveal pervasive socioeconomic biases in both established models like GPT-2 and state-of-the-art models like Llama 2 and Falcon.
arXiv Detail & Related papers (2024-05-28T23:54:44Z) - Detecting Bias in Large Language Models: Fine-tuned KcBERT [0.0]
We define such harm as societal bias and assess ethnic, gender, and racial biases in a model fine-tuned with Korean comments.
Our contribution lies in demonstrating that societal bias exists in Korean language models due to language-dependent characteristics.
arXiv Detail & Related papers (2024-03-16T02:27:19Z) - Born With a Silver Spoon? Investigating Socioeconomic Bias in Large Language Models [1.4436965372953483]
We evaluate the degree of socioeconomic bias expressed in large language models and the variation of this degree as a function of model size.
Our analysis reveals that while humans disagree on which situations require empathy toward the underprivileged, most large language models are unable to empathize with the socioeconomically underprivileged regardless of the situation.
arXiv Detail & Related papers (2024-02-16T23:18:19Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities.
The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations.
This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.