Related papers: Evaluating Large Language Models on Controlled Generation Tasks

Evaluating Large Language Models on Controlled Generation Tasks

URL: http://arxiv.org/abs/2310.14542v1
Date: Mon, 23 Oct 2023 03:48:24 GMT
Title: Evaluating Large Language Models on Controlled Generation Tasks
Authors: Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma
Abstract summary: We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
Score: 92.64781370921486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks. We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models. We conclude that **large language models struggle at meeting fine-grained hard constraints**.

Related papers

Evaluating Large Language Models on Multiword Expressions in Multilingual and Code-Switched Contexts [2.519319150166215]
This study evaluates how state-of-the-art language models process the ambiguity of potentially idiomatic multiword expressions.<n>We find that large language models, despite their strengths, struggle with nuanced language.
arXiv Detail & Related papers (2025-04-10T16:39:28Z)
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining [4.38070902806635]
We set up a benchmark for languages Croatian, Serbian, Bosnian and Montenegrin. We show that comparable performance to dedicated from-scratch models can be obtained by additionally pretraining available multilingual models. We also show that neighboring languages, in our case Slovenian, can be included in the additional pretraining with little to no loss in the performance of the final model.
arXiv Detail & Related papers (2024-04-08T11:55:44Z)
Perturbed examples reveal invariances shared by language models [8.04604449335578]
We introduce a novel framework to compare two NLP models. Via experiments on models from the same and different architecture families, this framework offers insights about how changes in models affect linguistic capabilities.
arXiv Detail & Related papers (2023-11-07T17:48:35Z)
Black-box language model explanation by context length probing [7.526153863886609]
We present context length probing, a novel explanation technique for causal language models. The technique is model-agnostic and does not rely on access to model internals beyond computing token-level probabilities. We apply context length probing to large pre-trained language models and offer some initial analyses and insights.
arXiv Detail & Related papers (2022-12-30T16:24:10Z)
Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks. BCLMs have been shown to be efficient in controllable language generation. We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z)
PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z)
Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z)
On the Multilingual Capabilities of Very Large-Scale English Language Models [0.0]
Generative Pre-trained Transformers (GPTs) have recently been scaled to unprecedented sizes in the history of machine learning. In this work, we investigate the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan. We find that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario.
arXiv Detail & Related papers (2021-08-30T16:18:50Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.