Deception detection in text and its relation to the cultural dimension
of individualism/collectivism
- URL: http://arxiv.org/abs/2105.12530v1
- Date: Wed, 26 May 2021 13:09:47 GMT
- Title: Deception detection in text and its relation to the cultural dimension
of individualism/collectivism
- Authors: Katerina Papantoniou, Panagiotis Papadakos, Theodore Patkos, Giorgos
Flouris, Ion Androutsopoulos, Dimitris Plexousakis
- Abstract summary: We investigate if differences in the usage of specific linguistic features of deception across cultures can be confirmed and attributed to norms in respect to the individualism/collectivism divide.
We create culture/language-aware classifiers by experimenting with a wide range of n-gram features based on phonology, morphology and syntax.
We conducted our experiments over 11 datasets from 5 languages i.e., English, Dutch, Russian, Spanish and Romanian, from six countries (US, Belgium, India, Russia, Mexico and Romania)
- Score: 6.17866386107486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deception detection is a task with many applications both in direct physical
and in computer-mediated communication. Our focus is on automatic deception
detection in text across cultures. We view culture through the prism of the
individualism/collectivism dimension and we approximate culture by using
country as a proxy. Having as a starting point recent conclusions drawn from
the social psychology discipline, we explore if differences in the usage of
specific linguistic features of deception across cultures can be confirmed and
attributed to norms in respect to the individualism/collectivism divide. We
also investigate if a universal feature set for cross-cultural text deception
detection tasks exists. We evaluate the predictive power of different feature
sets and approaches. We create culture/language-aware classifiers by
experimenting with a wide range of n-gram features based on phonology,
morphology and syntax, other linguistic cues like word and phoneme counts,
pronouns use, etc., and token embeddings. We conducted our experiments over 11
datasets from 5 languages i.e., English, Dutch, Russian, Spanish and Romanian,
from six countries (US, Belgium, India, Russia, Mexico and Romania), and we
applied two classification methods i.e, logistic regression and fine-tuned BERT
models. The results showed that our task is fairly complex and demanding. There
are indications that some linguistic cues of deception have cultural origins,
and are consistent in the context of diverse domains and dataset settings for
the same language. This is more evident for the usage of pronouns and the
expression of sentiment in deceptive language. The results of this work show
that the automatic deception detection across cultures and languages cannot be
handled in a unified manner, and that such approaches should be augmented with
knowledge about cultural differences and the domains of interest.
Related papers
- Extrinsic Evaluation of Cultural Competence in Large Language Models [53.626808086522985]
We focus on extrinsic evaluation of cultural competence in two text generation tasks.
We evaluate model outputs when an explicit cue of culture, specifically nationality, is perturbed in the prompts.
We find weak correlations between text similarity of outputs for different countries and the cultural values of these countries.
arXiv Detail & Related papers (2024-06-17T14:03:27Z) - CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark [68.21939124278065]
Culturally-diverse multilingual Visual Question Answering benchmark designed to cover a rich set of languages and cultures.
CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions.
We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models.
arXiv Detail & Related papers (2024-06-10T01:59:00Z) - CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs)
We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z) - The Echoes of Multilinguality: Tracing Cultural Value Shifts during LM Fine-tuning [23.418656688405605]
We study how languages can exert influence on the cultural values encoded for different test languages, by studying how such values are revised during fine-tuning.
Lastly, we use a training data attribution method to find patterns in the fine-tuning examples, and the languages that they come from, that tend to instigate value shifts.
arXiv Detail & Related papers (2024-05-21T12:55:15Z) - Investigating Cultural Alignment of Large Language Models [10.738300803676655]
We show that Large Language Models (LLMs) genuinely encapsulate the diverse knowledge adopted by different cultures.
We quantify cultural alignment by simulating sociological surveys, comparing model responses to those of actual survey participants as references.
We introduce Anthropological Prompting, a novel method leveraging anthropological reasoning to enhance cultural alignment.
arXiv Detail & Related papers (2024-02-20T18:47:28Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Computer Vision Datasets and Models Exhibit Cultural and Linguistic
Diversity in Perception [28.716435050743957]
We study how people from different cultural backgrounds observe vastly different concepts even when viewing the same visual stimuli.
By comparing textual descriptions generated across 7 languages for the same images, we find significant differences in the semantic content and linguistic expression.
Our work points towards the need to accounttuning for and embrace the diversity of human perception in the computer vision community.
arXiv Detail & Related papers (2023-10-22T16:51:42Z) - Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP.
We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba.
Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region.
All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z) - Assessing Cross-Cultural Alignment between ChatGPT and Human Societies:
An Empirical Study [9.919972416590124]
ChatGPT has garnered widespread recognition for its exceptional ability to generate human-like responses in dialogue.
We investigate the underlying cultural background of ChatGPT by analyzing its responses to questions designed to quantify human cultural differences.
arXiv Detail & Related papers (2023-03-30T15:43:39Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.