Multi-level Adaptive Contrastive Learning for Knowledge Internalization
in Dialogue Generation
- URL: http://arxiv.org/abs/2310.08943v2
- Date: Tue, 17 Oct 2023 12:53:58 GMT
- Title: Multi-level Adaptive Contrastive Learning for Knowledge Internalization
in Dialogue Generation
- Authors: Chenxu Yang, Zheng Lin, Lanrui Wang, Chong Tian, Liang Pang, Jiangnan
Li, Qirong Ho, Yanan Cao, Weiping Wang
- Abstract summary: Knowledge-grounded dialogue generation aims to incorporate external knowledge to supplement the context.
However, the model often fails to internalize this information into responses in a human-like manner.
We propose a Multi-level Adaptive Contrastive Learning framework that dynamically samples negative examples and subsequently penalizes degeneration behaviors.
- Score: 37.55417272177113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge-grounded dialogue generation aims to mitigate the issue of text
degeneration by incorporating external knowledge to supplement the context.
However, the model often fails to internalize this information into responses
in a human-like manner. Instead, it simply inserts segments of the provided
knowledge into generic responses. As a result, the generated responses tend to
be tedious, incoherent, and in lack of interactivity which means the
degeneration problem is still unsolved. In this work, we first find that such
copying-style degeneration is primarily due to the weak likelihood objective,
which allows the model to "cheat" the objective by merely duplicating knowledge
segments in a superficial pattern matching based on overlap. To overcome this
challenge, we then propose a Multi-level Adaptive Contrastive Learning (MACL)
framework that dynamically samples negative examples and subsequently penalizes
degeneration behaviors at both the token-level and sequence-level. Extensive
experiments on the WoW dataset demonstrate the effectiveness of our approach
across various pre-trained models.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Topic Modeling as Multi-Objective Contrastive Optimization [46.24876966674759]
Recent representation learning approaches enhance neural topic models by optimizing the weighted linear combination of the evidence lower bound (ELBO) of the log-likelihood and the contrastive learning objective that contrasts pairs of input documents.
We introduce a novel contrastive learning method oriented towards sets of topic vectors to capture useful semantics that are shared among a set of input documents.
Our framework consistently produces higher-performing neural topic models in terms of topic coherence, topic diversity, and downstream performance.
arXiv Detail & Related papers (2024-02-12T11:18:32Z) - Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - Hexa: Self-Improving for Knowledge-Grounded Dialogue System [13.293318039036562]
We develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data.
In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses.
arXiv Detail & Related papers (2023-10-10T08:15:24Z) - Utterance Rewriting with Contrastive Learning in Multi-turn Dialogue [22.103162555263143]
We introduce contrastive learning and multi-task learning to jointly model the problem.
Our proposed model achieves state-of-the-art performance on several public datasets.
arXiv Detail & Related papers (2022-03-22T10:13:27Z) - Adaptive Bridge between Training and Inference for Dialogue [36.64781557775641]
We propose a novel adaptive switching mechanism, which learns to automatically transit between ground-truth learning and generated learning.
Our method achieves a significant improvement in terms of metric-based evaluation and human evaluation.
arXiv Detail & Related papers (2021-10-22T02:43:27Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query.
A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives.
We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z) - A Controllable Model of Grounded Response Generation [122.7121624884747]
Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process.
We propose a framework that we call controllable grounded response generation (CGRG)
We show that using this framework, a transformer based model with a novel inductive attention mechanism, trained on a conversation-like Reddit dataset, outperforms strong generation baselines.
arXiv Detail & Related papers (2020-05-01T21:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.