Generative Models are Unsupervised Predictors of Page Quality: A
Colossal-Scale Study
- URL: http://arxiv.org/abs/2008.13533v1
- Date: Mon, 17 Aug 2020 07:13:24 GMT
- Title: Generative Models are Unsupervised Predictors of Page Quality: A
Colossal-Scale Study
- Authors: Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew
Tomkins
- Abstract summary: Large generative language models such as GPT-2 are well-known for their ability to generate text.
We show that unsupervised predictors of "page quality" emerge, able to detect low quality content without any training.
We conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.
- Score: 86.62171568318716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large generative language models such as GPT-2 are well-known for their
ability to generate text as well as their utility in supervised downstream
tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human
evaluation that classifiers trained to discriminate between human and
machine-generated text emerge as unsupervised predictors of "page quality",
able to detect low quality content without any training. This enables fast
bootstrapping of quality indicators in a low-resource setting. Secondly,
curious to understand the prevalence and nature of low quality pages in the
wild, we conduct extensive qualitative and quantitative analysis over 500
million web articles, making this the largest-scale study ever conducted on the
topic.
Related papers
- Mashee at SemEval-2024 Task 8: The Impact of Samples Quality on the Performance of In-Context Learning for Machine Text Classification [0.0]
We employ the chi-square test to identify high-quality samples and compare the results with those obtained using low-quality samples.
Our findings demonstrate that utilizing high-quality samples leads to improved performance with respect to all evaluated metrics.
arXiv Detail & Related papers (2024-05-28T12:47:43Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality.
In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value.
We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z) - Language Model as an Annotator: Unsupervised Context-aware Quality
Phrase Generation [20.195149109523314]
We propose LMPhrase, a novel unsupervised quality phrase mining framework built upon large pre-trained language models (LMs)
Specifically, we first mine quality phrases as silver labels by employing a parameter-free probing technique called Perturbed Masking on the pre-trained language model BERT.
In contrast to typical statistic-based or distantly-supervised methods, our silver labels, derived from large pre-trained language models, take into account rich contextual information contained in the LMs.
arXiv Detail & Related papers (2023-12-28T20:32:44Z) - Calibrating LLM-Based Evaluator [92.17397504834825]
We propose AutoCalibrate, a multi-stage, gradient-free approach to calibrate and align an LLM-based evaluator toward human preference.
Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels.
Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration.
arXiv Detail & Related papers (2023-09-23T08:46:11Z) - Exploring the Use of Large Language Models for Reference-Free Text
Quality Evaluation: An Empirical Study [63.27346930921658]
ChatGPT is capable of evaluating text quality effectively from various perspectives without reference.
The Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches.
arXiv Detail & Related papers (2023-04-03T05:29:58Z) - NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation.
Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z) - Sentence Level Human Translation Quality Estimation with Attention-based
Neural Networks [0.30458514384586394]
This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations.
Empirical results on a large human annotated dataset show that the neural model outperforms feature-based methods significantly.
arXiv Detail & Related papers (2020-03-13T16:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.