Diversidade linguística e inclusão digital: desafios para uma ia brasileira
- URL: http://arxiv.org/abs/2411.01259v1
- Date: Sat, 02 Nov 2024 14:17:33 GMT
- Title: Diversidade linguística e inclusão digital: desafios para uma ia brasileira
- Authors: Raquel Meister Ko Freitag,
- Abstract summary: Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat.
This paper examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized.
- Score: 0.0
- License:
- Abstract: Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning.
Related papers
- LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models [62.47865866398233]
This white paper proposes a framework to generate linguistic tools for low-resource languages.
By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity.
arXiv Detail & Related papers (2024-11-20T16:59:41Z) - Standard Language Ideology in AI-Generated Language [1.2815904071470705]
We explore standard language ideology in language generated by large language models (LLMs)
We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most "appropriate" language.
arXiv Detail & Related papers (2024-06-13T01:08:40Z) - What is "Typological Diversity" in NLP? [7.58293347591642]
We introduce metrics to approximate the diversity of language selection along several axes.
We show that skewed language selection can lead to overestimated multilingual performance.
arXiv Detail & Related papers (2024-02-06T18:29:39Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Diversity and Language Technology: How Techno-Linguistic Bias Can Cause
Epistemic Injustice [4.234367850767171]
We show that many attempts produce flawed solutions that adhere to a hard-wired representational preference for certain languages.
As we show through the paper, techno-linguistic bias can result in systems that can only express concepts that are part of the language and culture of dominant powers.
We argue that at the root of this problem lies a systematic tendency of technology developer communities to apply a simplistic understanding of diversity.
arXiv Detail & Related papers (2023-07-25T16:08:27Z) - Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z) - Democratizing Ethical Assessment of Natural Language Generation Models [0.0]
Natural language generation models are computer systems that generate coherent language when prompted with a sequence of words as context.
Despite their ubiquity and many beneficial applications, language generation models also have the potential to inflict social harms.
Ethical assessment of these models is therefore critical.
This article introduces a new tool to democratize and standardize ethical assessment of natural language generation models.
arXiv Detail & Related papers (2022-06-30T12:20:31Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.