Generative Models are Unsupervised Predictors of Page Quality: A
Colossal-Scale Study
- URL: http://arxiv.org/abs/2008.13533v1
- Date: Mon, 17 Aug 2020 07:13:24 GMT
- Title: Generative Models are Unsupervised Predictors of Page Quality: A
Colossal-Scale Study
- Authors: Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew
Tomkins
- Abstract summary: Large generative language models such as GPT-2 are well-known for their ability to generate text.
We show that unsupervised predictors of "page quality" emerge, able to detect low quality content without any training.
We conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.
- Score: 86.62171568318716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large generative language models such as GPT-2 are well-known for their
ability to generate text as well as their utility in supervised downstream
tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human
evaluation that classifiers trained to discriminate between human and
machine-generated text emerge as unsupervised predictors of "page quality",
able to detect low quality content without any training. This enables fast
bootstrapping of quality indicators in a low-resource setting. Secondly,
curious to understand the prevalence and nature of low quality pages in the
wild, we conduct extensive qualitative and quantitative analysis over 500
million web articles, making this the largest-scale study ever conducted on the
topic.
Related papers
- Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts [0.0]
A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9%.
Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets?
We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments.
arXiv Detail & Related papers (2024-10-18T17:59:57Z) - Balancing Label Quantity and Quality for Scalable Elicitation [2.2143065226946423]
We study the microeconomics of the quantity-quality tradeoff on binary NLP classification tasks.
We observe three regimes of eliciting classification knowledge from pretrained models using supervised finetuning.
We find that the accuracy of supervised fine-tuning can be improved by up to 5 percentage points at a fixed labeling budget.
arXiv Detail & Related papers (2024-10-17T04:39:58Z) - Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild [66.40314964321557]
We propose a novel IQA method named RichIQA to explore the rich subjective rating information beyond MOS to predict image quality in the wild.
RichIQA is characterized by two key novel designs: (1) a three-stage image quality prediction network which exploits the powerful feature representation capability of the Convolutional vision Transformer (CvT) and mimics the short-term and long-term memory mechanisms of human brain.
RichIQA outperforms state-of-the-art competitors on multiple large-scale in the wild IQA databases with rich subjective rating labels.
arXiv Detail & Related papers (2024-09-09T12:00:17Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality.
In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value.
We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z) - Exploring the Use of Large Language Models for Reference-Free Text
Quality Evaluation: An Empirical Study [63.27346930921658]
ChatGPT is capable of evaluating text quality effectively from various perspectives without reference.
The Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches.
arXiv Detail & Related papers (2023-04-03T05:29:58Z) - NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation.
Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z) - Sentence Level Human Translation Quality Estimation with Attention-based
Neural Networks [0.30458514384586394]
This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations.
Empirical results on a large human annotated dataset show that the neural model outperforms feature-based methods significantly.
arXiv Detail & Related papers (2020-03-13T16:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.