HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
- URL: http://arxiv.org/abs/2310.16746v1
- Date: Wed, 25 Oct 2023 16:23:17 GMT
- Title: HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
- Authors: Nafis Irtiza Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca
Giannotti, Dongwon Lee
- Abstract summary: We introduce the largest benchmark for spoken texts - HANSEN (Human ANd ai Spoken tExt beNchmark)
HANSEN encompasses meticulous curation of existing speech datasets accompanied by transcripts, alongside the creation of novel AI-generated spoken text datasets.
To evaluate and demonstrate the utility of HANSEN, we perform Authorship (AA) & Author Verification (AV) on human-spoken datasets and conducted Human vs. AI spoken text detection using state-of-the-art (SOTA) models.
- Score: 14.467821652366574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Authorship Analysis, also known as stylometry, has been an essential aspect
of Natural Language Processing (NLP) for a long time. Likewise, the recent
advancement of Large Language Models (LLMs) has made authorship analysis
increasingly crucial for distinguishing between human-written and AI-generated
texts. However, these authorship analysis tasks have primarily been focused on
written texts, not considering spoken texts. Thus, we introduce the largest
benchmark for spoken texts - HANSEN (Human ANd ai Spoken tExt beNchmark).
HANSEN encompasses meticulous curation of existing speech datasets accompanied
by transcripts, alongside the creation of novel AI-generated spoken text
datasets. Together, it comprises 17 human datasets, and AI-generated spoken
texts created using 3 prominent LLMs: ChatGPT, PaLM2, and Vicuna13B. To
evaluate and demonstrate the utility of HANSEN, we perform Authorship
Attribution (AA) & Author Verification (AV) on human-spoken datasets and
conducted Human vs. AI spoken text detection using state-of-the-art (SOTA)
models. While SOTA methods, such as, character ngram or Transformer-based
model, exhibit similar AA & AV performance in human-spoken datasets compared to
written ones, there is much room for improvement in AI-generated spoken text
detection. The HANSEN benchmark is available at:
https://huggingface.co/datasets/HANSEN-REPO/HANSEN.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis [0.0]
This research explores the nuanced differences in texts produced by AI and those written by humans.
The study investigates various linguistic traits, patterns of creativity, and potential biases inherent in human-written and AI- generated texts.
arXiv Detail & Related papers (2024-07-15T18:09:03Z) - Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool [0.0]
This study aims to investigate how various linguistic components are represented in both types of texts, assessing the ability of AI to emulate human writing.
Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features.
arXiv Detail & Related papers (2024-07-04T05:37:09Z) - RFBES at SemEval-2024 Task 8: Investigating Syntactic and Semantic
Features for Distinguishing AI-Generated and Human-Written Texts [0.8437187555622164]
This article investigates the problem of AI-generated text detection from two different aspects: semantics and syntax.
We present an AI model that can distinguish AI-generated texts from human-written ones with high accuracy on both multilingual and monolingual tasks.
arXiv Detail & Related papers (2024-02-19T00:40:17Z) - Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing
AI-Generated Text [0.0]
My research investigates the use of cutting-edge hybrid deep learning models to accurately differentiate between AI-generated text and human writing.
I applied a robust methodology, utilising a carefully selected dataset comprising AI and human texts from various sources, each tagged with instructions.
arXiv Detail & Related papers (2023-11-27T06:26:53Z) - The Imitation Game: Detecting Human and AI-Generated Texts in the Era of
ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres.
We employ several machine learning models to classify the texts.
Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation.
Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.