Understanding the Effectiveness of Very Large Language Models on Dialog
Evaluation
- URL: http://arxiv.org/abs/2301.12004v1
- Date: Fri, 27 Jan 2023 22:02:27 GMT
- Title: Understanding the Effectiveness of Very Large Language Models on Dialog
Evaluation
- Authors: Jessica Huynh, Cathy Jiao, Prakhar Gupta, Shikib Mehri, Payal Bajaj,
Vishrav Chaudhary, Maxine Eskenazi
- Abstract summary: Large language models (LLMs) have been used for generation and can now output human-like text.
This paper investigates how the number of examples in the prompt and the type of example selection used affect the model's performance.
- Score: 20.18656308749408
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models have steadily increased in size over the past few years. They
achieve a high level of performance on various natural language processing
(NLP) tasks such as question answering and summarization. Large language models
(LLMs) have been used for generation and can now output human-like text. Due to
this, there are other downstream tasks in the realm of dialog that can now
harness the LLMs' language understanding capabilities. Dialog evaluation is one
task that this paper will explore. It concentrates on prompting with LLMs:
BLOOM, OPT, GPT-3, Flan-T5, InstructDial and TNLGv2. The paper shows that the
choice of datasets used for training a model contributes to how well it
performs on a task as well as on how the prompt should be structured.
Specifically, the more diverse and relevant the group of datasets that a model
is trained on, the better dialog evaluation performs. This paper also
investigates how the number of examples in the prompt and the type of example
selection used affect the model's performance.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Frugal Prompting for Dialog Models [17.048111072193933]
This study examines different approaches for building dialog systems using large language models (LLMs)
As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context.
The research also analyzes the representations of dialog history that have the optimal usable-information density.
arXiv Detail & Related papers (2023-05-24T09:06:49Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - OPAL: Ontology-Aware Pretrained Language Model for End-to-End
Task-Oriented Dialogue [40.62090743056549]
This paper presents an ontology-aware pretrained language model (OPAL) for end-to-end task-oriented dialogue (TOD)
Unlike chit-chat dialogue models, task-oriented dialogue models fulfill at least two task-specific modules: dialogue state tracker (DST) and response generator (RG)
arXiv Detail & Related papers (2022-09-10T04:38:27Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z) - Do Response Selection Models Really Know What's Next? Utterance
Manipulation Strategies for Multi-turn Response Selection [11.465266718370536]
We study the task of selecting the optimal response given a user and system utterance history in retrieval-based dialog systems.
We propose utterance manipulation strategies (UMS) to address this problem.
UMS consist of several strategies (i.e., insertion, deletion, and search) which aid the response selection model towards maintaining dialog coherence.
arXiv Detail & Related papers (2020-09-10T07:39:05Z) - Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG)
A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection.
We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.