Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?
- URL: http://arxiv.org/abs/2406.11065v1
- Date: Sun, 16 Jun 2024 20:41:44 GMT
- Title: Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?
- Authors: Guan-Ting Lin, Hung-yi Lee,
- Abstract summary: Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue.
This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis.
We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
- Score: 64.72966061510375
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. While Large Language Models (LLMs) have revolutionized natural language processing, their ability to understand emphasis in dialogue remains unclear. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various LLMs, both open-source and commercial, to measure their performance in understanding emphasis. Additionally, we propose an automatic evaluation pipeline using GPT-4, which achieves a high correlation with human rating. Our findings reveal that although commercial LLMs generally perform better, there is still significant room for improvement in comprehending emphasized sentences.
Related papers
- Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech [29.847183061204436]
Large language models (LLMs) can take into account users' emotions or speaking styles when providing their responses.
In this work, we utilize an end-to-end system with a speech encoder.
We find that this training framework allows the encoder to generate tokens that capture both semantic and paralinguistic information in speech.
arXiv Detail & Related papers (2024-10-02T01:32:47Z) - Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - Exploring the Factual Consistency in Dialogue Comprehension of Large Language Models [51.75805497456226]
This work focuses on the factual consistency issue with the help of the dialogue summarization task.
Our evaluation shows that, on average, 26.8% of the summaries generated by LLMs contain factual inconsistency.
To stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data.
arXiv Detail & Related papers (2023-11-13T09:32:12Z) - BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting.
We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance.
We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z) - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response.
We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English.
Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z) - Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features.
To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives.
Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.