Preventing Author Profiling through Zero-Shot Multilingual
Back-Translation
- URL: http://arxiv.org/abs/2109.09133v1
- Date: Sun, 19 Sep 2021 14:36:22 GMT
- Title: Preventing Author Profiling through Zero-Shot Multilingual
Back-Translation
- Authors: David Ifeoluwa Adelani, Miaoran Zhang, Xiaoyu Shen, Ali Davody, Thomas
Kleinbauer, and Dietrich Klakow
- Abstract summary: We propose a simple, zero-shot way to effectively lower the risk of author profiling through multilingual back-translation.
Results from both an automatic and a human evaluation show that our approach achieves the best overall performance.
We are able to lower the adversarial prediction of gender and race by up to $22%$ while retaining $95%$ of the original utility on downstream tasks.
- Score: 15.871735427038386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Documents as short as a single sentence may inadvertently reveal sensitive
information about their authors, including e.g. their gender or ethnicity.
Style transfer is an effective way of transforming texts in order to remove any
information that enables author profiling. However, for a number of current
state-of-the-art approaches the improved privacy is accompanied by an
undesirable drop in the down-stream utility of the transformed data. In this
paper, we propose a simple, zero-shot way to effectively lower the risk of
author profiling through multilingual back-translation using off-the-shelf
translation models. We compare our models with five representative text style
transfer models on three datasets across different domains. Results from both
an automatic and a human evaluation show that our approach achieves the best
overall performance while requiring no training data. We are able to lower the
adversarial prediction of gender and race by up to $22\%$ while retaining
$95\%$ of the original utility on downstream tasks.
Related papers
- Prefix-Tuning Based Unsupervised Text Style Transfer [29.86587278794342]
Unsupervised text style transfer aims at training a generative model that can alter the style of the input sentence while preserving its content.
In this paper, we employ powerful pre-trained large language models and present a new prefix-tuning-based method for unsupervised text style transfer.
arXiv Detail & Related papers (2023-10-23T06:13:08Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Prompt-Based Editing for Text Style Transfer [25.863546922455498]
We present a prompt-based editing approach for text style transfer.
We transform a prompt-based generation problem into a classification one, which is a training-free process.
Our approach largely outperforms the state-of-the-art systems that have 20 times more parameters.
arXiv Detail & Related papers (2023-01-27T21:31:14Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Few-shot Controllable Style Transfer for Low-Resource Settings: A Study
in Indian Languages [13.980482277351523]
Style transfer is the task of rewriting an input sentence into a target style while preserving its content.
We push the state-of-the-art for few-shot style transfer with a new method modeling the stylistic difference between paraphrases.
Our model achieves 2-3x better performance and output diversity in formality transfer and code-mixing addition across five Indian languages.
arXiv Detail & Related papers (2021-10-14T14:16:39Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Fine-tuning GPT-3 for Russian Text Summarization [77.34726150561087]
This paper showcases ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries.
We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art model's performance without additional changes in architecture or loss function.
arXiv Detail & Related papers (2021-08-07T19:01:40Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Improving Zero and Few-Shot Abstractive Summarization with Intermediate
Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks.
Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains.
We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.