UNIDECOR: A Unified Deception Corpus for Cross-Corpus Deception
Detection
- URL: http://arxiv.org/abs/2306.02827v2
- Date: Wed, 7 Jun 2023 23:07:26 GMT
- Title: UNIDECOR: A Unified Deception Corpus for Cross-Corpus Deception
Detection
- Authors: Aswathy Velutharambath and Roman Klinger
- Abstract summary: We conduct a correlation analysis of linguistic cues of deception across datasets to understand the differences.
We perform cross-corpus modeling experiments which show that a cross-domain generalization is challenging to achieve.
The unified deception corpus (UNIDECOR) can be obtained from https://www.ims.uni-stuttgart.de/data/unidecor.
- Score: 17.016156702855604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Verbal deception has been studied in psychology, forensics, and computational
linguistics for a variety of reasons, like understanding behaviour patterns,
identifying false testimonies, and detecting deception in online communication.
Varying motivations across research fields lead to differences in the domain
choices to study and in the conceptualization of deception, making it hard to
compare models and build robust deception detection systems for a given
language. With this paper, we improve this situation by surveying available
English deception datasets which include domains like social media reviews,
court testimonials, opinion statements on specific topics, and deceptive
dialogues from online strategy games. We consolidate these datasets into a
single unified corpus. Based on this resource, we conduct a correlation
analysis of linguistic cues of deception across datasets to understand the
differences and perform cross-corpus modeling experiments which show that a
cross-domain generalization is challenging to achieve. The unified deception
corpus (UNIDECOR) can be obtained from
https://www.ims.uni-stuttgart.de/data/unidecor.
Related papers
- What if Deception Cannot be Detected? A Cross-Linguistic Study on the Limits of Deception Detection from Text [10.912953196817554]
We introduce a belief-based deception framework, which defines deception as a misalignment between an author's claims and true beliefs.<n>We construct three corpora, collectively referred to as DeFaBel, including a German-language corpus of deceptive and non-deceptive arguments.<n>Using these corpora, we evaluate commonly reported linguistic cues of deception.
arXiv Detail & Related papers (2025-05-19T14:12:05Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Learning Disentangled Speech Representations [0.412484724941528]
SynSpeech is a novel large-scale synthetic speech dataset designed to enable research on disentangled speech representations.
We present a framework to evaluate disentangled representation learning techniques, applying both linear probing and established supervised disentanglement metrics.
We find that SynSpeech facilitates benchmarking across a range of factors, achieving promising disentanglement of simpler features like gender and speaking style, while highlighting challenges in isolating complex attributes like speaker identity.
arXiv Detail & Related papers (2023-11-04T04:54:17Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Guiding Computational Stance Detection with Expanded Stance Triangle
Framework [25.2980607215715]
Stance detection determines whether the author of a piece of text is in favor of, against, or neutral towards a specified target.
We decompose the stance detection task from a linguistic perspective, and investigate key components and inference paths in this task.
arXiv Detail & Related papers (2023-05-31T13:33:29Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in
Low-Resource English Varieties [3.3536302616846734]
We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits.
We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.
arXiv Detail & Related papers (2022-09-15T21:19:31Z) - A combined approach to the analysis of speech conversations in a contact
center domain [2.575030923243061]
We describe an experimentation with a speech analytics process for an Italian contact center, that deals with call recordings extracted from inbound or outbound flows.
First, we illustrate in detail the development of an in-house speech-to-text solution, based on Kaldi framework.
Then, we evaluate and compare different approaches to the semantic tagging of call transcripts.
Finally, a decision tree inducer, called J48S, is applied to the problem of tagging.
arXiv Detail & Related papers (2022-03-12T10:03:20Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Dialog speech sentiment classification for imbalanced datasets [7.84604505907019]
In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection.
We propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset.
arXiv Detail & Related papers (2021-09-15T11:43:04Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.