Measuring and Improving Semantic Diversity of Dialogue Generation
- URL: http://arxiv.org/abs/2210.05725v1
- Date: Tue, 11 Oct 2022 18:36:54 GMT
- Title: Measuring and Improving Semantic Diversity of Dialogue Generation
- Authors: Seungju Han, Beomsu Kim, Buru Chang
- Abstract summary: We introduce a new automatic evaluation metric to measure the semantic diversity of generated responses.
We show that our proposed metric captures human judgments on response diversity better than existing lexical-level diversity metrics.
We also propose a simple yet effective learning method that improves the semantic diversity of generated responses.
- Score: 21.59385143783728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Response diversity has become an important criterion for evaluating the
quality of open-domain dialogue generation models. However, current evaluation
metrics for response diversity often fail to capture the semantic diversity of
generated responses, as they mainly consider lexical aspects of the generated
responses. In this paper, we introduce a new automatic evaluation metric to
measure the semantic diversity of generated responses. Through human
evaluation, we demonstrate that our proposed metric captures human judgments on
response diversity better than existing lexical-level diversity metrics.
Furthermore, motivated by analyzing an existing dialogue dataset, we propose a
simple yet effective learning method that improves the semantic diversity of
generated responses. Our learning method weights training samples based on the
semantic distribution of the training set. We show that our learning method
improves response diversity and coherency better than other baseline methods
through automatic and human evaluation.
Related papers
- PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Improving Diversity of Demographic Representation in Large Language
Models via Collective-Critiques and Self-Voting [19.79214899011072]
This paper formalizes diversity of representation in generative large language models.
We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes.
We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal.
arXiv Detail & Related papers (2023-10-25T10:17:17Z) - Diverse and Faithful Knowledge-Grounded Dialogue Generation via
Sequential Posterior Inference [82.28542500317445]
We present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues.
Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution.
arXiv Detail & Related papers (2023-06-01T21:23:13Z) - Generate, Evaluate, and Select: A Dialogue System with a Response
Evaluator for Diversity-Aware Response Generation [9.247397520986999]
We aim to overcome the lack of diversity in responses of current dialogue systems.
We propose a generator-evaluator model that evaluates multiple responses generated by a response generator.
We conduct human evaluations to compare the output of the proposed system with that of a baseline system.
arXiv Detail & Related papers (2022-06-10T08:22:22Z) - Semantic Diversity in Dialogue with Natural Language Inference [19.74618235525502]
This paper makes two substantial contributions to improving diversity in dialogue generation.
First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation.
Second, we demonstrate how to iteratively improve the semantic diversity of a sampled set of responses via a new generation procedure called Diversity Threshold Generation.
arXiv Detail & Related papers (2022-05-03T13:56:32Z) - Evaluation of Self-taught Learning-based Representations for Facial
Emotion Recognition [62.30451764345482]
This work describes different strategies to generate unsupervised representations obtained through the concept of self-taught learning for facial emotion recognition.
The idea is to create complementary representations promoting diversity by varying the autoencoders' initialization, architecture, and training data.
Experimental results on Jaffe and Cohn-Kanade datasets using a leave-one-subject-out protocol show that FER methods based on the proposed diverse representations compare favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2022-04-26T22:48:15Z) - Towards Robust Online Dialogue Response Generation [62.99904593650087]
We argue that this can be caused by a discrepancy between training and real-world testing.
We propose a hierarchical sampling-based method consisting of both utterance-level sampling and semi-utterance-level sampling.
arXiv Detail & Related papers (2022-03-07T06:51:41Z) - Unsupervised Domain Adaptive Person Re-Identification via Human Learning
Imitation [67.52229938775294]
In past years, researchers propose to utilize the teacher-student framework in their methods to decrease the domain gap between different person re-identification datasets.
Inspired by recent teacher-student framework based methods, we propose to conduct further exploration to imitate the human learning process from different aspects.
arXiv Detail & Related papers (2021-11-28T01:14:29Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Evaluating for Diversity in Question Generation over Text [5.369031521471668]
We argue that commonly-used evaluation metrics such as BLEU and METEOR are not suitable for this task due to the inherent diversity of reference questions.
We propose a variational encoder-decoder model for this task.
arXiv Detail & Related papers (2020-08-17T13:16:12Z) - Evaluating the Evaluation of Diversity in Natural Language Generation [43.05127848086264]
We propose a framework for evaluating diversity metrics in natural language generation systems.
Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.
arXiv Detail & Related papers (2020-04-06T20:44:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.