Related papers: Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

URL: http://arxiv.org/abs/2008.13533v1
Date: Mon, 17 Aug 2020 07:13:24 GMT
Title: Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
Authors: Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins
Abstract summary: Large generative language models such as GPT-2 are well-known for their ability to generate text. We show that unsupervised predictors of "page quality" emerge, able to detect low quality content without any training. We conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.
Score: 86.62171568318716
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human evaluation that classifiers trained to discriminate between human and machine-generated text emerge as unsupervised predictors of "page quality", able to detect low quality content without any training. This enables fast bootstrapping of quality indicators in a low-resource setting. Secondly, curious to understand the prevalence and nature of low quality pages in the wild, we conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.

Related papers

AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images [58.87047247313503]
We introduce AGHI-QA, the first large-scale benchmark specifically designed for quality assessment of human images (AGHIs) The dataset comprises 4,000 images generated from 400 carefully crafted text prompts using 10 state-of-the-art T2I models. We conduct a systematic subjective study to collect multidimensional annotations, including perceptual quality scores, text-image correspondence scores, visible and distorted body part labels.
arXiv Detail & Related papers (2025-04-30T04:36:56Z)
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric [56.73624246192218]
We conduct a pioneering study on human activity AI-generated videos (AGVs) We focus on visual quality evaluation and the identification of semantic distortions. We develop an objective evaluation metric, named AI-Generated Human activity Video Quality metric (GHVQ), to automatically analyze the quality of human activity AGVs.
arXiv Detail & Related papers (2024-11-25T17:58:43Z)
Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts [0.0]
A huge number of detectors and collections with AI fragments have emerged, and several detection methods even showed recognition quality up to 99.9%. Are detectors actually highly trustworthy or do their high benchmark scores come from the poor quality of evaluation datasets? We present a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments.
arXiv Detail & Related papers (2024-10-18T17:59:57Z)
Balancing Label Quantity and Quality for Scalable Elicitation [2.2143065226946423]
We study the microeconomics of the quantity-quality tradeoff on binary NLP classification tasks. We observe three regimes of eliciting classification knowledge from pretrained models using supervised finetuning. We find that the accuracy of supervised fine-tuning can be improved by up to 5 percentage points at a fixed labeling budget.
arXiv Detail & Related papers (2024-10-17T04:39:58Z)
Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild [66.40314964321557]
We propose a novel IQA method named RichIQA to explore the rich subjective rating information beyond MOS to predict image quality in the wild. RichIQA is characterized by two key novel designs: (1) a three-stage image quality prediction network which exploits the powerful feature representation capability of the Convolutional vision Transformer (CvT) and mimics the short-term and long-term memory mechanisms of human brain. RichIQA outperforms state-of-the-art competitors on multiple large-scale in the wild IQA databases with rich subjective rating labels.
arXiv Detail & Related papers (2024-09-09T12:00:17Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality. In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value. We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z)
Language Model as an Annotator: Unsupervised Context-aware Quality Phrase Generation [20.195149109523314]
We propose LMPhrase, a novel unsupervised quality phrase mining framework built upon large pre-trained language models (LMs) Specifically, we first mine quality phrases as silver labels by employing a parameter-free probing technique called Perturbed Masking on the pre-trained language model BERT. In contrast to typical statistic-based or distantly-supervised methods, our silver labels, derived from large pre-trained language models, take into account rich contextual information contained in the LMs.
arXiv Detail & Related papers (2023-12-28T20:32:44Z)
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z)
Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks [0.30458514384586394]
This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations. Empirical results on a large human annotated dataset show that the neural model outperforms feature-based methods significantly.
arXiv Detail & Related papers (2020-03-13T16:57:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.