Detection of Machine-Generated Text: Literature Survey
- URL: http://arxiv.org/abs/2402.01642v1
- Date: Tue, 2 Jan 2024 01:44:15 GMT
- Title: Detection of Machine-Generated Text: Literature Survey
- Authors: Dmytro Valiaiev
- Abstract summary: This literature survey aims to compile and synthesize accomplishments and developments in the field of machine-generated text.
It also gives an overview of machine-generated text trends and explores the larger societal implications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since language models produce fake text quickly and easily, there is an
oversupply of such content in the public domain. The degree of sophistication
and writing style has reached a point where differentiating between human
authored and machine-generated content is nearly impossible. As a result, works
generated by language models rather than human authors have gained significant
media attention and stirred controversy.Concerns regarding the possible
influence of advanced language models on society have also arisen, needing a
fuller knowledge of these processes. Natural language generation (NLG) and
generative pre-trained transformer (GPT) models have revolutionized a variety
of sectors: the scope not only permeated throughout journalism and customer
service but also reached academia. To mitigate the hazardous implications that
may arise from the use of these models, preventative measures must be
implemented, such as providing human agents with the capacity to distinguish
between artificially made and human composed texts utilizing automated systems
and possibly reverse-engineered language models. Furthermore, to ensure a
balanced and responsible approach, it is critical to have a full grasp of the
socio-technological ramifications of these breakthroughs. This literature
survey aims to compile and synthesize accomplishments and developments in the
aforementioned work, while also identifying future prospects. It also gives an
overview of machine-generated text trends and explores the larger societal
implications. Ultimately, this survey intends to contribute to the development
of robust and effective approaches for resolving the issues connected with the
usage and detection of machine-generated text by exploring the interplay
between the capabilities of language models and their possible implications.
Related papers
- From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models [17.04716417556556]
This review visits foundational concepts such as the distributional hypothesis and contextual similarity.
We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT.
The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models.
Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications.
arXiv Detail & Related papers (2024-11-06T15:40:02Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Generative Artificial Intelligence: A Systematic Review and Applications [7.729155237285151]
This paper documents the systematic review and analysis of recent advancements and techniques in Generative AI.
The major impact that generative AI has made to date, has been in language generation with the development of large language models.
The paper ends with a discussion of Responsible AI principles, and the necessary ethical considerations for the sustainability and growth of these generative models.
arXiv Detail & Related papers (2024-05-17T18:03:59Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z) - Beyond Turing: A Comparative Analysis of Approaches for Detecting Machine-Generated Text [1.919654267936118]
Traditional shallow learning, Language Model (LM) fine-tuning, and Multilingual Model fine-tuning are evaluated.
Results reveal considerable differences in performance across methods.
This study paves the way for future research aimed at creating robust and highly discriminative models.
arXiv Detail & Related papers (2023-11-21T06:23:38Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Analysis of the Evolution of Advanced Transformer-Based Language Models:
Experiments on Opinion Mining [0.5735035463793008]
This paper studies the behaviour of the cutting-edge Transformer-based language models on opinion mining.
Our comparative study shows leads and paves the way for production engineers regarding the approach to focus on.
arXiv Detail & Related papers (2023-08-07T01:10:50Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - An Overview on Controllable Text Generation via Variational
Auto-Encoders [15.97186478109836]
Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans.
Latent variable models (LVM) such as variational auto-encoders (VAEs) are designed to characterize the distributional pattern of textual data.
This overview gives an introduction to existing generation schemes, problems associated with text variational auto-encoders, and a review of several applications about the controllable generation.
arXiv Detail & Related papers (2022-11-15T07:36:11Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.