Text Style Transfer for Bias Mitigation using Masked Language Modeling
- URL: http://arxiv.org/abs/2201.08643v1
- Date: Fri, 21 Jan 2022 11:06:33 GMT
- Title: Text Style Transfer for Bias Mitigation using Masked Language Modeling
- Authors: Ewoenam Kwaku Tokpo, Toon Calders
- Abstract summary: We present a text style transfer model that can be used to automatically debias textual data.
Our model solves such issues by combining latent content encoding with explicit keyword replacement.
- Score: 9.350763916068026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well known that textual data on the internet and other digital
platforms contain significant levels of bias and stereotypes. Although many
such texts contain stereotypes and biases that inherently exist in natural
language for reasons that are not necessarily malicious, there are crucial
reasons to mitigate these biases. For one, these texts are being used as
training corpus to train language models for salient applications like
cv-screening, search engines, and chatbots; such applications are turning out
to produce discriminatory results. Also, several research findings have
concluded that biased texts have significant effects on the target demographic
groups. For instance, masculine-worded job advertisements tend to be less
appealing to female applicants.
In this paper, we present a text style transfer model that can be used to
automatically debias textual data. Our style transfer model improves on the
limitations of many existing style transfer techniques such as loss of content
information. Our model solves such issues by combining latent content encoding
with explicit keyword replacement. We will show that this technique produces
better content preservation whilst maintaining good style transfer accuracy.
Related papers
- Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias [8.168722337906148]
This study investigates the presence of bias in harmful speech classification of gender-queer dialect online.
We introduce a novel dataset, QueerLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs.
We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts.
arXiv Detail & Related papers (2024-05-23T18:07:28Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Don't lose the message while paraphrasing: A study on content preserving
style transfer [61.38460184163704]
Content preservation is critical for real-world applications of style transfer studies.
We compare various style transfer models on the example of the formality transfer domain.
We conduct a precise comparative study of several state-of-the-art techniques for style transfer.
arXiv Detail & Related papers (2023-08-17T15:41:08Z) - StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse
Representations and Content Enhancing [73.81778485157234]
Long texts usually involve more complicated author linguistic preferences such as discourse structures than sentences.
We formulate the task of non-parallel story author-style transfer, which requires transferring an input story into a specified author style.
We use an additional training objective to disentangle stylistic features from the learned discourse representation to prevent the model from degenerating to an auto-encoder.
arXiv Detail & Related papers (2022-08-29T08:47:49Z) - Whose Language Counts as High Quality? Measuring Language Ideologies in
Text Data Selection [83.3580786484122]
We find that newspapers from larger schools, located in wealthier, educated, and urban ZIP codes are more likely to be classified as high quality.
We argue that privileging any corpus as high quality entails a language ideology.
arXiv Detail & Related papers (2022-01-25T17:20:04Z) - The Authors Matter: Understanding and Mitigating Implicit Bias in Deep
Text Classification [36.361778457307636]
Deep text classification models can produce biased outcomes for texts written by authors of certain demographic groups.
In this paper, we first demonstrate that implicit bias exists in different text classification tasks for different demographic groups.
We then build a learning-based interpretation method to deepen our knowledge of implicit bias.
arXiv Detail & Related papers (2021-05-06T16:17:38Z) - Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
Bias in NLP [10.936043362876651]
We propose a decoding algorithm that reduces the probability of a model producing problematic text.
While our approach does by no means eliminate the issue of language models generating biased text, we believe it to be an important step in this direction.
arXiv Detail & Related papers (2021-02-28T11:07:37Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - Improving Disentangled Text Representation Learning with
Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging.
Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text.
Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.