Towards Bridging the Digital Language Divide
- URL: http://arxiv.org/abs/2307.13405v1
- Date: Tue, 25 Jul 2023 10:53:20 GMT
- Title: Towards Bridging the Digital Language Divide
- Authors: G\'abor Bella, Paula Helm, Gertraud Koch, Fausto Giunchiglia
- Abstract summary: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
- Score: 4.234367850767171
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: It is a well-known fact that current AI-based language technology -- language
models, machine translation systems, multilingual dictionaries and corpora --
focuses on the world's 2-3% most widely spoken languages. Recent research
efforts have attempted to expand the coverage of AI technology to
`under-resourced languages.' The goal of our paper is to bring attention to a
phenomenon that we call linguistic bias: multilingual language processing
systems often exhibit a hardwired, yet usually involuntary and hidden
representational preference towards certain languages. Linguistic bias is
manifested in uneven per-language performance even in the case of similar test
conditions. We show that biased technology is often the result of research and
development methodologies that do not do justice to the complexity of the
languages being represented, and that can even become ethically problematic as
they disregard valuable aspects of diversity as well as the needs of the
language communities themselves. As our attempt at building diversity-aware
language resources, we present a new initiative that aims at reducing
linguistic bias through both technological design and methodology, based on an
eye-level collaboration with local communities.
Related papers
- A Capabilities Approach to Studying Bias and Harm in Language Technologies [4.135516576952934]
We consider fairness, bias, and inclusion in Language Technologies through the lens of the Capabilities Approach.
The Capabilities Approach centers on what people are capable of achieving, given their intersectional social, political, and economic contexts.
We detail the Capabilities Approach, its relationship to multilingual and multicultural evaluation, and how the framework affords meaningful collaboration with community members in defining and measuring the harms of Language Technologies.
arXiv Detail & Related papers (2024-11-06T22:46:13Z) - Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Diversity and Language Technology: How Techno-Linguistic Bias Can Cause
Epistemic Injustice [4.234367850767171]
We show that many attempts produce flawed solutions that adhere to a hard-wired representational preference for certain languages.
As we show through the paper, techno-linguistic bias can result in systems that can only express concepts that are part of the language and culture of dominant powers.
We argue that at the root of this problem lies a systematic tendency of technology developer communities to apply a simplistic understanding of diversity.
arXiv Detail & Related papers (2023-07-25T16:08:27Z) - On the cross-lingual transferability of multilingual prototypical models
across NLU tasks [2.44288434255221]
Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications.
In practice, these approaches suffer from the drawbacks of domain-driven design and under-resourced languages.
This article proposes to investigate the cross-lingual transferability of using synergistically few-shot learning with prototypical neural networks and multilingual Transformers-based models.
arXiv Detail & Related papers (2022-07-19T09:55:04Z) - Overcoming Language Disparity in Online Content Classification with
Multimodal Learning [22.73281502531998]
Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks.
The development of advanced computational techniques and resources is disproportionately focused on the English language.
We explore the promise of incorporating the information contained in images via multimodal machine learning.
arXiv Detail & Related papers (2022-05-19T17:56:02Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - The State and Fate of Linguistic Diversity and Inclusion in the NLP
World [12.936270946393483]
Language technologies contribute to promoting multilingualism and linguistic diversity around the world.
Only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and applications.
arXiv Detail & Related papers (2020-04-20T07:19:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.