A comparison of several AI techniques for authorship attribution on
Romanian texts
- URL: http://arxiv.org/abs/2211.05180v1
- Date: Wed, 9 Nov 2022 20:24:48 GMT
- Title: A comparison of several AI techniques for authorship attribution on
Romanian texts
- Authors: Sanda Maria Avram and Mihai Oltean
- Abstract summary: We compare AI techniques for classifying literary texts written by multiple authors.
We also introduce a new dataset composed of texts written in the Romanian language on which we have run the algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Determining the author of a text is a difficult task. Here we compare
multiple AI techniques for classifying literary texts written by multiple
authors by taking into account a limited number of speech parts (prepositions,
adverbs, and conjunctions). We also introduce a new dataset composed of texts
written in the Romanian language on which we have run the algorithms. The
compared methods are Artificial Neural Networks, Support Vector Machines, Multi
Expression Programming, Decision Trees with C5.0, and k-Nearest Neighbour.
Numerical experiments show, first of all, that the problem is difficult, but
some algorithms are able to generate decent errors on the test set.
Related papers
- DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning [24.99797253885887]
We argue that the key to accomplishing this task lies in distinguishing writing styles of different authors.
We propose DeTeCtive, a multi-task auxiliary, multi-level contrastive learning framework.
Our method is compatible with a range of text encoders.
arXiv Detail & Related papers (2024-10-28T12:34:49Z) - Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text [4.902089836908786]
WhosAI is a triplet-network contrastive learning framework designed to predict whether a given input text has been generated by humans or AI.
We show that our proposed framework achieves outstanding results in both the Turing Test and Authorship tasks.
arXiv Detail & Related papers (2024-07-12T15:44:56Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Spot the bot: Coarse-Grained Partition of Semantic Paths for Bots and
Humans [55.2480439325792]
This paper focuses on comparing structures of the coarse-grained partitions of semantic paths for human-written and bot-generated texts.
As the semantic structure may be different for different languages, we investigate Russian, English, German, and Vietnamese languages.
arXiv Detail & Related papers (2024-02-27T10:38:37Z) - Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to
Document Level [4.250876580245865]
Existing AI-generated text classifiers have limited accuracy and often produce false positives.
We propose a novel approach using natural language processing (NLP) techniques.
We generate multiple paraphrased versions of a given question and inputting them into the large language model to generate answers.
By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response.
arXiv Detail & Related papers (2023-06-13T20:34:55Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - Evaluating Various Tokenizers for Arabic Text Classification [4.110108749051656]
We introduce three new tokenization algorithms for Arabic and compare them to three other baselines using unsupervised evaluations.
Our experiments show that the performance of such tokenization algorithms depends on the size of the dataset, type of the task, and the amount of morphology that exists in the dataset.
arXiv Detail & Related papers (2021-06-14T16:05:58Z) - ARTH: Algorithm For Reading Text Handily -- An AI Aid for People having
Word Processing Issues [0.0]
"ARTH" is a self-learning set of algorithms that is an intelligent way of fulfilling the need for "reading and understanding the text effortlessly"
The technology "ARTH" focuses on the revival of the joy of reading among those people, who have a poor vocabulary or any word processing issues.
arXiv Detail & Related papers (2021-01-23T09:39:45Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.