Building cross-language corpora for human understanding of privacy
policies
- URL: http://arxiv.org/abs/2302.05355v1
- Date: Fri, 10 Feb 2023 16:16:55 GMT
- Title: Building cross-language corpora for human understanding of privacy
policies
- Authors: Francesco Ciclosi, Silvia Vidor, and Fabio Massacci
- Abstract summary: This work provides a methodology for building comparable cross-language in a national language and a reference study language.
We provide an application example of our methodology comparing English and Italian extending the corpus of one of the first studies about users understanding of technical terms in privacy policies.
- Score: 7.1707060082291925
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Making sure that users understand privacy policies that impact them is a key
challenge for a real GDPR deployment. Research studies are mostly carried in
English, but in Europe and elsewhere, users speak a language that is not
English. Replicating studies in different languages requires the availability
of comparable cross-language privacy policies corpora. This work provides a
methodology for building comparable cross-language in a national language and a
reference study language. We provide an application example of our methodology
comparing English and Italian extending the corpus of one of the first studies
about users understanding of technical terms in privacy policies. We also
investigate other open issues that can make replication harder.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies.
We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z) - A Federated Learning Approach to Privacy Preserving Offensive Language Identification [14.487531876937247]
We propose a privacy preserving architecture for identifying offensive language online by introducing Federated Learning (FL)
FL is a decentralized architecture that allows multiple models to be trained locally without the need for data sharing.
We trained multiple deep learning models on four publicly available English benchmark datasets.
arXiv Detail & Related papers (2024-04-17T15:23:12Z) - Multilingual Evaluation of Semantic Textual Relatedness [0.0]
Semantic Textual Relatedness (STR) goes beyond superficial word overlap, considering linguistic elements and non-linguistic factors like topic, sentiment, and perspective.
Prior NLP research has predominantly focused on English, limiting its applicability across languages.
We explore STR in Marathi, Hindi, Spanish, and English, unlocking the potential for information retrieval, machine translation, and more.
arXiv Detail & Related papers (2024-04-13T17:16:03Z) - MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models [65.10456412127405]
MLaKE is a benchmark for the adaptability of knowledge editing methods across five languages.
MLaKE aggregates fact chains from Wikipedia across languages and generates questions in both free-form and multiple-choice.
We evaluate the multilingual knowledge editing generalization capabilities of existing methods on MLaKE.
arXiv Detail & Related papers (2024-04-07T15:23:28Z) - A Study on Scaling Up Multilingual News Framing Analysis [23.80807884935475]
This study explores the possibility of dataset creation through crowdsourcing.
We first extend framing analysis beyond English news to a multilingual context.
We also present a novel benchmark in Bengali and Portuguese on the immigration and same-sex marriage domains.
arXiv Detail & Related papers (2024-04-01T21:02:18Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - Monolingual and Cross-Lingual Acceptability Judgments with the Italian
CoLA corpus [2.418273287232718]
We describe the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments.
We also present the first cross-lingual experiments, aimed at assessing whether multilingual transformerbased approaches can benefit from using sentences in two languages during fine-tuning.
arXiv Detail & Related papers (2021-09-24T16:18:53Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.