Cross-lingual Offensive Language Detection: A Systematic Review of
Datasets, Transfer Approaches and Challenges
- URL: http://arxiv.org/abs/2401.09244v1
- Date: Wed, 17 Jan 2024 14:44:27 GMT
- Title: Cross-lingual Offensive Language Detection: A Systematic Review of
Datasets, Transfer Approaches and Challenges
- Authors: Aiqi Jiang, Arkaitz Zubiaga
- Abstract summary: This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning techniques in offensive language detection in social media.
Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain.
- Score: 10.079109184645478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing prevalence and rapid evolution of offensive language in social
media amplify the complexities of detection, particularly highlighting the
challenges in identifying such content across diverse languages. This survey
presents a systematic and comprehensive exploration of Cross-Lingual Transfer
Learning (CLTL) techniques in offensive language detection in social media. Our
study stands as the first holistic overview to focus exclusively on the
cross-lingual scenario in this domain. We analyse 67 relevant papers and
categorise these studies across various dimensions, including the
characteristics of multilingual datasets used, the cross-lingual resources
employed, and the specific CLTL strategies implemented. According to "what to
transfer", we also summarise three main CLTL transfer approaches: instance,
feature, and parameter transfer. Additionally, we shed light on the current
challenges and future research opportunities in this field. Furthermore, we
have made our survey resources available online, including two comprehensive
tables that provide accessible references to the multilingual datasets and CLTL
methods used in the reviewed literature.
Related papers
- A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [48.314619377988436]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches [11.508759658889382]
There is a tremendous lack of Ukrainian corpora for typical text classification tasks.
We explore cross-lingual knowledge transfer methods avoiding manual data curation.
We test the approaches on three text classification tasks.
arXiv Detail & Related papers (2024-04-02T15:37:09Z) - X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs [55.80189506270598]
X-PARADE is the first cross-lingual dataset of paragraph-level information divergences.
Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language.
Aligned paragraphs are sourced from Wikipedia pages in different languages.
arXiv Detail & Related papers (2023-09-16T04:34:55Z) - Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies [4.118037156777793]
Cross-lingual transfer is a promising technique to solve tasks in less-resourced languages.
We compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models.
arXiv Detail & Related papers (2023-09-12T09:37:08Z) - Soft Prompt Decoding for Multilingual Dense Retrieval [30.766917713997355]
We show that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance.
This is due to the heterogeneous and imbalanced nature of multilingual collections.
We present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space.
arXiv Detail & Related papers (2023-05-15T21:17:17Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm.
We provide insights into what makes multilingual sequential learning particularly challenging.
The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z) - Transfer Learning for Multi-lingual Tasks -- a Survey [11.596820548674266]
Cross languages content and multilingualism in natural language processing (NLP) are hot topics.
We provide a comprehensive overview of the existing literature with a focus on transfer learning techniques in multilingual tasks.
arXiv Detail & Related papers (2021-08-28T20:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.