A Roadmap for Multilingual, Multimodal Domain Independent Deception Detection
- URL: http://arxiv.org/abs/2405.03920v1
- Date: Tue, 7 May 2024 00:38:34 GMT
- Title: A Roadmap for Multilingual, Multimodal Domain Independent Deception Detection
- Authors: Dainis Boumber, Rakesh M. Verma, Fatima Zahra Qachfar,
- Abstract summary: Deception, a prevalent aspect of human communication, has undergone a significant transformation in the digital age.
Recent studies have shown the possibility of the existence of universal linguistic cues to deception across domains within the English language.
The practical task of deception detection in low-resource languages is not a well-studied problem due to the lack of labeled data.
- Score: 2.1506382989223782
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deception, a prevalent aspect of human communication, has undergone a significant transformation in the digital age. With the globalization of online interactions, individuals are communicating in multiple languages and mixing languages on social media, with varied data becoming available in each language and dialect. At the same time, the techniques for detecting deception are similar across the board. Recent studies have shown the possibility of the existence of universal linguistic cues to deception across domains within the English language; however, the existence of such cues in other languages remains unknown. Furthermore, the practical task of deception detection in low-resource languages is not a well-studied problem due to the lack of labeled data. Another dimension of deception is multimodality. For example, a picture with an altered caption in fake news or disinformation may exist. This paper calls for a comprehensive investigation into the complexities of deceptive language across linguistic boundaries and modalities within the realm of computer security and natural language processing and the possibility of using multilingual transformer models and labeled data in various languages to universally address the task of deception detection.
Related papers
- Variationist: Exploring Multifaceted Variation and Bias in Written Language Data [3.666781404469562]
Exploring and understanding language data is a fundamental stage in all areas dealing with human language.
Yet, there is currently a lack of a unified, customizable tool to seamlessly inspect and visualize language variation and bias.
In this paper, we introduce Variationist, a highly-modular, descriptive, and task-agnostic tool that fills this gap.
arXiv Detail & Related papers (2024-06-25T15:41:07Z) - Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Multi-lingual agents through multi-headed neural networks [0.0]
This paper focuses on cooperative Multi-Agent Reinforcement Learning.
In this context, multiple distinct and incompatible languages can emerge.
We take inspiration from the Continual Learning literature and equip our agents with multi-headed neural networks which enable our agents to be multi-lingual.
arXiv Detail & Related papers (2021-11-22T11:39:42Z) - Capturing the diversity of multilingual societies [0.0]
We consider the processes at work in language shift through a conjunction of theoretical and data-driven perspectives.
A large-scale empirical study of spatial patterns of languages in multilingual societies using Twitter and census data yields a wide diversity.
We propose a model in which coexistence of languages may be reached when learning the other language is facilitated and when bilinguals favor the use of the endangered language.
arXiv Detail & Related papers (2021-05-06T10:27:43Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.