Model and Evaluation: Towards Fairness in Multilingual Text
Classification
- URL: http://arxiv.org/abs/2303.15697v1
- Date: Tue, 28 Mar 2023 03:00:01 GMT
- Title: Model and Evaluation: Towards Fairness in Multilingual Text
Classification
- Authors: Nankai Lin, Junheng He, Zhenghang Tang, Dong Zhou, Aimin Yang
- Abstract summary: We propose a debiasing framework for multilingual text classification based on contrastive learning.
The model contains four modules: multilingual text representation module, language fusion module, text debiasing module, and text classification module.
We propose a multi-dimensional fairness evaluation framework for multilingual text classification, which evaluates the model's monolingual equality difference, multilingual equality difference, multilingual equality performance difference, and destructiveness of the fairness strategy.
- Score: 6.697876965452054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, more and more research has focused on addressing bias in text
classification models. However, existing research mainly focuses on the
fairness of monolingual text classification models, and research on fairness
for multilingual text classification is still very limited. In this paper, we
focus on the task of multilingual text classification and propose a debiasing
framework for multilingual text classification based on contrastive learning.
Our proposed method does not rely on any external language resources and can be
extended to any other languages. The model contains four modules: multilingual
text representation module, language fusion module, text debiasing module, and
text classification module. The multilingual text representation module uses a
multilingual pre-trained language model to represent the text, the language
fusion module makes the semantic spaces of different languages tend to be
consistent through contrastive learning, and the text debiasing module uses
contrastive learning to make the model unable to identify sensitive attributes'
information. The text classification module completes the basic tasks of
multilingual text classification. In addition, the existing research on the
fairness of multilingual text classification is relatively simple in the
evaluation mode. The evaluation method of fairness is the same as the
monolingual equality difference evaluation method, that is, the evaluation is
performed on a single language. We propose a multi-dimensional fairness
evaluation framework for multilingual text classification, which evaluates the
model's monolingual equality difference, multilingual equality difference,
multilingual equality performance difference, and destructiveness of the
fairness strategy. We hope that our work can provide a more general debiasing
method and a more comprehensive evaluation framework for multilingual text
fairness tasks.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Multilingual Few-Shot Learning via Language Model Retrieval [18.465566186549072]
Transformer-based language models have achieved remarkable success in few-shot in-context learning.
We conduct a study of retrieving semantically similar few-shot samples and using them as the context.
We evaluate the proposed method on five natural language understanding datasets related to intent detection, question classification, sentiment analysis, and topic classification.
arXiv Detail & Related papers (2023-06-19T14:27:21Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Are Multilingual Models the Best Choice for Moderately Under-resourced
Languages? A Comprehensive Assessment for Catalan [0.05277024349608833]
This work focuses on Catalan with the aim of exploring what extent a medium-sized monolingual language model is competitive with state-of-the-art large multilingual models.
We build a clean, high-quality textual Catalan corpus (CaText), train a Transformer-based language model for Catalan (BERTa), and devise a thorough evaluation in a diversity of settings.
The result is a new benchmark, the Catalan Language Understanding Benchmark (CLUB), which we publish as an open resource.
arXiv Detail & Related papers (2021-07-16T13:52:01Z) - Cross-lingual Text Classification with Heterogeneous Graph Neural
Network [2.6936806968297913]
Cross-lingual text classification aims at training a classifier on the source language and transferring the knowledge to target languages.
Recent multilingual pretrained language models (mPLM) achieve impressive results in cross-lingual classification tasks.
We propose a simple yet effective method to incorporate heterogeneous information within and across languages for cross-lingual text classification.
arXiv Detail & Related papers (2021-05-24T12:45:42Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.