Toxicity in Multilingual Machine Translation at Scale
- URL: http://arxiv.org/abs/2210.03070v2
- Date: Wed, 5 Apr 2023 20:06:42 GMT
- Title: Toxicity in Multilingual Machine Translation at Scale
- Authors: Marta R. Costa-juss\`a, Eric Smith, Christophe Ropers, Daniel Licht,
Jean Maillard, Javier Ferrando, Carlos Escolano
- Abstract summary: We evaluate and analyze added toxicity when translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences, covering 13 demographic axes) from English into 164 languages.
An automatic toxicity evaluation shows that added toxicity across languages varies from 0% to 5%.
The output languages with the most added toxicity tend to be low-resource ones, and the demographic axes with the most added toxicity include sexual orientation, gender and sex, and ability.
- Score: 3.4620477930009472
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Machine Translation systems can produce different types of errors, some of
which are characterized as critical or catastrophic due to the specific
negative impact that they can have on users. In this paper we focus on one type
of critical error: added toxicity. We evaluate and analyze added toxicity when
translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences,
covering 13 demographic axes) from English into 164 languages. An automatic
toxicity evaluation shows that added toxicity across languages varies from 0%
to 5%. The output languages with the most added toxicity tend to be
low-resource ones, and the demographic axes with the most added toxicity
include sexual orientation, gender and sex, and ability. We also perform human
evaluation on a subset of 8 translation directions, confirming the prevalence
of true added toxicity. We use a measurement of the amount of source
contribution to the translation, where a low source contribution implies
hallucination, to interpret what causes toxicity. Making use of the input
attributions allows us to explain toxicity, because the source contributions
significantly correlate with toxicity for 84% of languages studied. Given our
findings, our recommendations to reduce added toxicity are to curate training
data to avoid mistranslations, mitigate hallucination and check unstable
translations.
Related papers
- FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts [13.470734853274587]
Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language.
We create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts.
We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity.
arXiv Detail & Related papers (2024-06-25T14:02:11Z) - PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models [27.996123856250065]
Existing toxicity benchmarks are overwhelmingly focused on English.
We introduce PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages.
arXiv Detail & Related papers (2024-05-15T14:22:33Z) - Unveiling the Implicit Toxicity in Large Language Models [77.90933074675543]
The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use.
We show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting.
We propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs.
arXiv Detail & Related papers (2023-11-29T06:42:36Z) - Facilitating Fine-grained Detection of Chinese Toxic Language:
Hierarchical Taxonomy, Resources, and Benchmarks [18.44630180661091]
Existing datasets lack fine-grained annotation of toxic types and expressions.
It is crucial to introduce lexical knowledge to detect the toxicity of posts.
In this paper, we facilitate the fine-grained detection of Chinese toxic language.
arXiv Detail & Related papers (2023-05-08T03:50:38Z) - ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
Implicit Hate Speech Detection [33.715318646717385]
ToxiGen is a large-scale dataset of 274k toxic and benign statements about 13 minority groups.
Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale.
We find that 94.5% of toxic examples are labeled as hate speech by human annotators.
arXiv Detail & Related papers (2022-03-17T17:57:56Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.