Related papers: Three Ways of Using Large Language Models to Evaluate Chat

Three Ways of Using Large Language Models to Evaluate Chat

URL: http://arxiv.org/abs/2308.06502v1
Date: Sat, 12 Aug 2023 08:34:15 GMT
Title: Three Ways of Using Large Language Models to Evaluate Chat
Authors: Ond\v{r}ej Pl\'atek and Vojt\v{e}ch Hude\v{c}ek and Patricia Schmidtov\'a and Mateusz Lango and Ond\v{r}ej Du\v{s}ek
Abstract summary: This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of responses based on large language models (LLMs) We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. An ablation study conducted after the challenge deadline shows that the new Llama 2 models are closing the performance gap between ChatGPT and open-source LLMs.
Score: 3.7767218432589553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of chatbot responses based on large language models (LLMs). We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. We also analyze the performance of the other two approaches and report needed improvements for future work. We developed the three systems over just two weeks, showing the potential of LLMs for this task. An ablation study conducted after the challenge deadline shows that the new Llama 2 models are closing the performance gap between ChatGPT and open-source LLMs. However, we find that the Llama 2 models do not benefit from few-shot examples in the same way as ChatGPT.

Related papers

JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models [0.9736758288065405]
This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations" Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video. Our proposed approach addresses these challenges by a two-step framework.
arXiv Detail & Related papers (2024-03-05T12:07:18Z)
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling [42.00097476584174]
We propose a novel approach for solving dialogue state tracking with large language models (LLMs) through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. We show that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs.
arXiv Detail & Related papers (2024-02-16T06:13:18Z)
Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection. Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z)
Llama 2: Open Foundation and Fine-Tuned Chat Models [65.43397761706336]
We develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases.
arXiv Detail & Related papers (2023-07-18T14:31:57Z)
Pushing the Limits of ChatGPT on NLP Tasks [79.17291002710517]
Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines. In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors. We propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks.
arXiv Detail & Related papers (2023-06-16T09:40:05Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
ChatIE: Zero-Shot Information Extraction via Chatting with ChatGPT [89.49161588240061]
Zero-shot information extraction (IE) aims to build IE systems from the unannotated text. Recent efforts on large language models (LLMs, e.g., GPT-3, ChatGPT) show promising performance on zero-shot settings. We transform the zero-shot IE task into a multi-turn question-answering problem with a two-stage framework (ChatIE)
arXiv Detail & Related papers (2023-02-20T12:57:12Z)
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z)
Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks. Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
A Study on Prompt-based Few-Shot Learning Methods for Belief State Tracking in Task-oriented Dialog Systems [10.024834304960846]
We tackle the Dialogue Belief State Tracking problem of task-oriented conversational systems. Recent approaches to this problem leveraging Transformer-based models have yielded great results. We explore prompt-based few-shot learning for Dialogue Belief State Tracking.
arXiv Detail & Related papers (2022-04-18T05:29:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.