Metric-Based In-context Learning: A Case Study in Text Simplification
- URL: http://arxiv.org/abs/2307.14632v1
- Date: Thu, 27 Jul 2023 05:45:35 GMT
- Title: Metric-Based In-context Learning: A Case Study in Text Simplification
- Authors: Subha Vadlamannati, G\"ozde G\"ul \c{S}ahin
- Abstract summary: In-context learning (ICL) for large language models has proven to be a powerful approach for many natural language processing tasks.
determining the best method to select examples for ICL is nontrivial as the results can vary greatly depending on the quality, quantity, and order of examples used.
We propose Metric-Based in-context Learning (MBL) method that utilizes commonly used TS metrics such as SARI, compression ratio, and BERT-Precision for selection.
- Score: 5.33024001730262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) for large language models has proven to be a
powerful approach for many natural language processing tasks. However,
determining the best method to select examples for ICL is nontrivial as the
results can vary greatly depending on the quality, quantity, and order of
examples used. In this paper, we conduct a case study on text simplification
(TS) to investigate how to select the best and most robust examples for ICL. We
propose Metric-Based in-context Learning (MBL) method that utilizes commonly
used TS metrics such as SARI, compression ratio, and BERT-Precision for
selection. Through an extensive set of experiments with various-sized GPT
models on standard TS benchmarks such as TurkCorpus and ASSET, we show that
examples selected by the top SARI scores perform the best on larger models such
as GPT-175B, while the compression ratio generally performs better on smaller
models such as GPT-13B and GPT-6.7B. Furthermore, we demonstrate that MBL is
generally robust to example orderings and out-of-domain test sets, and
outperforms strong baselines and state-of-the-art finetuned language models.
Finally, we show that the behaviour of large GPT models can be implicitly
controlled by the chosen metric. Our research provides a new framework for
selecting examples in ICL, and demonstrates its effectiveness in text
simplification tasks, breaking new ground for more accurate and efficient NLG
systems.
Related papers
- Selecting Between BERT and GPT for Text Classification in Political Science Research [4.487884986288122]
We evaluate the effectiveness of BERT-based versus GPT-based models in low-data scenarios.
We conclude by comparing these approaches in terms of performance, ease of use, and cost.
arXiv Detail & Related papers (2024-11-07T07:29:39Z) - Preference Alignment Improves Language Model-Based TTS [76.70693823683091]
preference alignment algorithms adjust LMs to align with the preferences of reward models, enhancing the desirability of the generated content.
With a 1.15B parameter LM-based TTS model, we demonstrate that preference alignment consistently improves intelligibility, speaker similarity, and proxy subjective evaluation scores.
arXiv Detail & Related papers (2024-09-19T01:58:19Z) - Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process [45.632012199451275]
In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs.
Existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios.
We introduce the Language Model-based Determinant Point Process (LM-DPP) that simultaneously considers the uncertainty and diversity of unlabeled instances for optimal selection.
arXiv Detail & Related papers (2024-08-04T18:08:15Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - Designing Informative Metrics for Few-Shot Example Selection [14.961505860372492]
We propose a complexity-based prompt selection approach for sequence tagging tasks.
This approach avoids the training of a dedicated model for selection of examples.
We use both sentence- and word-level metrics to match the complexity of examples to the (test) sentence being considered.
arXiv Detail & Related papers (2024-03-06T17:11:38Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - Coverage-based Example Selection for In-Context Learning [27.215972147196805]
We show that BERTScore-Recall (BSR) selects better examples that demonstrate more of the salient aspects of the test input.
On 15 datasets spanning 6 tasks and with 7 diverse LLMs, we show that (1) BSR is the superior metric for in-context example selection across the board, and (2) for compositional tasks, Set-BSR outperforms independent ranking by up to 17 points on average.
arXiv Detail & Related papers (2023-05-24T08:58:28Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - True Few-Shot Learning with Language Models [78.42578316883271]
We evaluate the few-shot ability of LMs when held-out examples are unavailable.
Our findings suggest that prior work significantly overestimated the true few-shot ability of LMs.
arXiv Detail & Related papers (2021-05-24T17:55:51Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.