Generative AI Text Classification using Ensemble LLM Approaches
- URL: http://arxiv.org/abs/2309.07755v1
- Date: Thu, 14 Sep 2023 14:41:46 GMT
- Title: Generative AI Text Classification using Ensemble LLM Approaches
- Authors: Harika Abburi, Michael Suesserman, Nirmala Pudota, Balaji Veeramani,
Edward Bowen, Sanmitra Bhattacharya
- Abstract summary: Large Language Models (LLMs) have shown impressive performance across a variety of AI and natural language processing tasks.
We propose an ensemble neural model that generates probabilities from different pre-trained LLMs.
For the first task of distinguishing between AI and human generated text, our model ranked in fifth and thirteenth place.
- Score: 0.12483023446237698
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) have shown impressive performance across a
variety of Artificial Intelligence (AI) and natural language processing tasks,
such as content creation, report generation, etc. However, unregulated malign
application of these models can create undesirable consequences such as
generation of fake news, plagiarism, etc. As a result, accurate detection of
AI-generated language can be crucial in responsible usage of LLMs. In this
work, we explore 1) whether a certain body of text is AI generated or written
by human, and 2) attribution of a specific language model in generating a body
of text. Texts in both English and Spanish are considered. The datasets used in
this study are provided as part of the Automated Text Identification
(AuTexTification) shared task. For each of the research objectives stated
above, we propose an ensemble neural model that generates probabilities from
different pre-trained LLMs which are used as features to a Traditional Machine
Learning (TML) classifier following it. For the first task of distinguishing
between AI and human generated text, our model ranked in fifth and thirteenth
place (with macro $F1$ scores of 0.733 and 0.649) for English and Spanish
texts, respectively. For the second task on model attribution, our model ranked
in first place with macro $F1$ scores of 0.625 and 0.653 for English and
Spanish texts, respectively.
Related papers
- CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts [10.027843402296678]
This paper constructs a comprehensive benchmark in both Chinese and English to evaluate mainstream AI-generated text detectors.
We categorize text generation into five distinct operations: Create, Update, Delete, Rewrite, and Translate.
For each CUDRT category, we have developed extensive datasets to thoroughly assess detector performance.
arXiv Detail & Related papers (2024-06-13T12:43:40Z) - Exploration of Masked and Causal Language Modelling for Text Generation [6.26998839917804]
This paper conducts an extensive comparison of Causal Language Modelling approaches for text generation tasks.
We first employ quantitative metrics and then perform a qualitative human evaluation to analyse coherence and grammatical correctness.
The results show that consistently outperforms CLM in text generation across all datasets.
arXiv Detail & Related papers (2024-05-21T09:33:31Z) - A Simple yet Efficient Ensemble Approach for AI-generated Text Detection [0.5840089113969194]
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing.
It is essential to build automated approaches capable of distinguishing between artificially generated text and human-authored text.
We propose a simple yet efficient solution by ensembling predictions from multiple constituent LLMs.
arXiv Detail & Related papers (2023-11-06T13:11:02Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - The Imitation Game: Detecting Human and AI-Generated Texts in the Era of
ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres.
We employ several machine learning models to classify the texts.
Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z) - An Open Dataset and Model for Language Identification [84.15194457400253]
We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages.
We make both the model and the dataset available to the research community.
arXiv Detail & Related papers (2023-05-23T08:43:42Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - MEGA: Multilingual Evaluation of Generative AI [23.109803506475174]
Generative AI models have shown impressive performance on many Natural Language Processing tasks.
Most studies on generative LLMs have been restricted to English.
It is unclear how capable these models are at understanding and generating text in other languages.
arXiv Detail & Related papers (2023-03-22T13:03:10Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.