FinChat: Corpus and evaluation setup for Finnish chat conversations on
everyday topics
- URL: http://arxiv.org/abs/2008.08315v1
- Date: Wed, 19 Aug 2020 07:58:16 GMT
- Title: FinChat: Corpus and evaluation setup for Finnish chat conversations on
everyday topics
- Authors: Katri Leino, Juho Leinonen, Mittul Singh, Sami Virpioja, Mikko Kurimo
- Abstract summary: We describe our collection efforts to create the Finnish chat conversation corpus FinChat, made available publicly.
FinChat includes unscripted conversations on seven topics from people of different ages.
In a human evaluation, responses to questions from the evaluation set generated by the chatbots are predominantly marked as incoherent.
- Score: 15.94497202872835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating open-domain chatbots requires large amounts of conversational data
and related benchmark tasks to evaluate them. Standardized evaluation tasks are
crucial for creating automatic evaluation metrics for model development;
otherwise, comparing the models would require resource-expensive human
evaluation. While chatbot challenges have recently managed to provide a
plethora of such resources for English, resources in other languages are not
yet available. In this work, we provide a starting point for Finnish
open-domain chatbot research. We describe our collection efforts to create the
Finnish chat conversation corpus FinChat, which is made available publicly.
FinChat includes unscripted conversations on seven topics from people of
different ages. Using this corpus, we also construct a retrieval-based
evaluation task for Finnish chatbot development. We observe that off-the-shelf
chatbot models trained on conversational corpora do not perform better than
chance at choosing the right answer based on automatic metrics, while humans
can do the same task almost perfectly. Similarly, in a human evaluation,
responses to questions from the evaluation set generated by the chatbots are
predominantly marked as incoherent. Thus, FinChat provides a challenging
evaluation set, meant to encourage chatbot development in Finnish.
Related papers
- LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction.
Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z) - Measuring and Controlling Instruction (In)Stability in Language Model Dialogs [72.38330196290119]
System-prompting is a tool for customizing language-model chatbots, enabling them to follow a specific instruction.
We propose a benchmark to test the assumption, evaluating instruction stability via self-chats.
We reveal a significant instruction drift within eight rounds of conversations.
We propose a lightweight method called split-softmax, which compares favorably against two strong baselines.
arXiv Detail & Related papers (2024-02-13T20:10:29Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z) - Leveraging Large Language Models to Power Chatbots for Collecting User
Self-Reported Data [15.808841433843742]
Large language models (LLMs) provide a new way to build chatbots by accepting natural language prompts.
We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably.
arXiv Detail & Related papers (2023-01-14T07:29:36Z) - Training Conversational Agents with Generative Conversational Networks [74.9941330874663]
We use Generative Conversational Networks to automatically generate data and train social conversational agents.
We evaluate our approach on TopicalChat with automatic metrics and human evaluators, showing that with 10% of seed data it performs close to the baseline that uses 100% of the data.
arXiv Detail & Related papers (2021-10-15T21:46:39Z) - Addressing Inquiries about History: An Efficient and Practical Framework
for Evaluating Open-domain Chatbot Consistency [28.255324166852535]
We propose the Addressing Inquiries about History (AIH) framework for the consistency evaluation.
At the conversation stage, AIH attempts to address appropriate inquiries about the dialogue history to induce the chatbots to redeclare the historical facts or opinions.
At the contradiction recognition stage, we can either employ human judges or a natural language inference (NLI) model to recognize whether the answers to the inquiries are contradictory with history.
arXiv Detail & Related papers (2021-06-04T03:04:13Z) - Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn
Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions.
Our framework included a guiding robot and an interlocutor model that plays the role of humans.
We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z) - A Multilingual African Embedding for FAQ Chatbots [0.0]
English, French, Arabic, Tunisian, Igbo,Yorub'a, and Hausa are used as languages and dialects.
We present our work on modified StarSpace embedding tailored for African dialects for the question-answering task.
arXiv Detail & Related papers (2021-03-16T16:36:40Z) - Spot The Bot: A Robust and Efficient Framework for the Evaluation of
Conversational Dialogue Systems [21.36935947626793]
emphSpot The Bot replaces human-bot conversations with conversations between bots.
Human judges only annotate for each entity in a conversation whether they think it is human or not.
emphSurvival Analysis measures which bot can uphold human-like behavior the longest.
arXiv Detail & Related papers (2020-10-05T16:37:52Z) - Pchatbot: A Large-Scale Dataset for Personalized Chatbot [49.16746174238548]
We introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively.
To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization.
The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models.
arXiv Detail & Related papers (2020-09-28T12:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.