ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding
- URL: http://arxiv.org/abs/2305.14196v3
- Date: Sun, 17 Dec 2023 17:05:09 GMT
- Title: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding
- Authors: Uri Shaham and Maor Ivgi and Avia Efrat and Jonathan Berant and Omer
Levy
- Abstract summary: We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts.
We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks.
We find that Claude outperforms ChatGPT, and GPT-4 achieves the highest average score.
- Score: 86.08738156304224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
understanding over long texts, which contains only test and small validation
sets, without training data. We adapt six tasks from the SCROLLS benchmark, and
add four new datasets, including two novel information fusing tasks, such as
aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a
comprehensive evaluation of both open-source and closed large language models,
finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest
average score. However, there is still room for improvement on multiple open
challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to
pass the naive baseline. As the state of the art is a moving target, we invite
researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.
Related papers
- TextSquare: Scaling up Text-Centric Visual Instruction Tuning [64.55339431760727]
We introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M.
Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs.
It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks.
arXiv Detail & Related papers (2024-04-19T11:38:08Z) - Low-resource classification of mobility functioning information in
clinical sentences using large language models [0.0]
This study evaluates the ability of publicly available large language models (LLMs) to accurately identify the presence of functioning information from clinical notes.
We collect a balanced binary classification dataset of 1000 sentences from the Mobility NER dataset, which was curated from n2c2 clinical notes.
arXiv Detail & Related papers (2023-12-15T20:59:17Z) - Pretraining on the Test Set Is All You Need [6.322449198012633]
We pretrain a 1 million parameter transformer-based LLM textbfphi-CTNL that achieves perfect results across diverse academic benchmarks.
textbfphi-CTNL also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.
arXiv Detail & Related papers (2023-09-13T19:47:33Z) - Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations [97.41375480696972]
We introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo-demonstrations for a given test input.
evaluation on nine classification datasets shows that Z-ICL outperforms previous zero-shot methods by a significant margin.
arXiv Detail & Related papers (2022-12-19T21:34:26Z) - ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient
Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation.
We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - ZeroGen: Efficient Zero-shot Learning via Dataset Generation [28.454620513642034]
We study a flexible and efficient zero-short learning method, ZeroGen.
Given a zero-shot task, we first generate a dataset from scratch using PLMs in an unsupervised manner.
Experiments and analysis on different NLP tasks, namely, text classification, question answering, and natural language inference, show the effectiveness of ZeroGen.
arXiv Detail & Related papers (2022-02-16T08:18:02Z) - ZeroBERTo -- Leveraging Zero-Shot Text Classification by Topic Modeling [57.80052276304937]
This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task.
We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
arXiv Detail & Related papers (2022-01-04T20:08:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.