Three Ways of Using Large Language Models to Evaluate Chat
- URL: http://arxiv.org/abs/2308.06502v1
- Date: Sat, 12 Aug 2023 08:34:15 GMT
- Title: Three Ways of Using Large Language Models to Evaluate Chat
- Authors: Ond\v{r}ej Pl\'atek and Vojt\v{e}ch Hude\v{c}ek and Patricia
Schmidtov\'a and Mateusz Lango and Ond\v{r}ej Du\v{s}ek
- Abstract summary: This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition.
We present three different approaches to predicting turn-level qualities of responses based on large language models (LLMs)
We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT.
An ablation study conducted after the challenge deadline shows that the new Llama 2 models are closing the performance gap between ChatGPT and open-source LLMs.
- Score: 3.7767218432589553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the systems submitted by team6 for ChatEval, the DSTC 11
Track 4 competition. We present three different approaches to predicting
turn-level qualities of chatbot responses based on large language models
(LLMs). We report improvement over the baseline using dynamic few-shot examples
from a vector store for the prompts for ChatGPT. We also analyze the
performance of the other two approaches and report needed improvements for
future work. We developed the three systems over just two weeks, showing the
potential of LLMs for this task. An ablation study conducted after the
challenge deadline shows that the new Llama 2 models are closing the
performance gap between ChatGPT and open-source LLMs. However, we find that the
Llama 2 models do not benefit from few-shot examples in the same way as
ChatGPT.
Related papers
- JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models [0.9736758288065405]
This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations"
Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video.
Our proposed approach addresses these challenges by a two-step framework.
arXiv Detail & Related papers (2024-03-05T12:07:18Z) - Large Language Models as Zero-shot Dialogue State Tracker through Function Calling [42.00097476584174]
We propose a novel approach for solving dialogue state tracking with large language models (LLMs) through function calling.
This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning.
We show that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs.
arXiv Detail & Related papers (2024-02-16T06:13:18Z) - Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection.
Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z) - Llama 2: Open Foundation and Fine-Tuned Chat Models [65.43397761706336]
We develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs)
Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases.
arXiv Detail & Related papers (2023-07-18T14:31:57Z) - Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines.
In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors.
We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - ChatIE: Zero-Shot Information Extraction via Chatting with ChatGPT [89.49161588240061]
Zero-shot information extraction (IE) aims to build IE systems from the unannotated text.
Recent efforts on large language models (LLMs, e.g., GPT-3, ChatGPT) show promising performance on zero-shot settings.
We transform the zero-shot IE task into a multi-turn question-answering problem with a two-stage framework (ChatIE)
arXiv Detail & Related papers (2023-02-20T12:57:12Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - A Study on Prompt-based Few-Shot Learning Methods for Belief State
Tracking in Task-oriented Dialog Systems [10.024834304960846]
We tackle the Dialogue Belief State Tracking problem of task-oriented conversational systems.
Recent approaches to this problem leveraging Transformer-based models have yielded great results.
We explore prompt-based few-shot learning for Dialogue Belief State Tracking.
arXiv Detail & Related papers (2022-04-18T05:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.