Modeling Complex Dialogue Mappings via Sentence Semantic Segmentation
Guided Conditional Variational Auto-Encoder
- URL: http://arxiv.org/abs/2212.00231v1
- Date: Thu, 1 Dec 2022 02:31:10 GMT
- Title: Modeling Complex Dialogue Mappings via Sentence Semantic Segmentation
Guided Conditional Variational Auto-Encoder
- Authors: Bin Sun, Shaoxiong Feng, Yiwei Li, Weichao Wang, Fei Mi, Yitong Li,
Kan Li
- Abstract summary: Complex dialogue mappings (CDM), including one-to-many and many-to-one mappings, tend to make dialogue models generate incoherent or dull responses.
This paper proposes a Sentence Semantic textbfSegmentation guided textbfConditional textbfVariational textbfAuto-textbfEncoder (SegCVAE) method which can model and take advantages of the CDM data.
- Score: 23.052838118122835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex dialogue mappings (CDM), including one-to-many and many-to-one
mappings, tend to make dialogue models generate incoherent or dull responses,
and modeling these mappings remains a huge challenge for neural dialogue
systems. To alleviate these problems, methods like introducing external
information, reconstructing the optimization function, and manipulating data
samples are proposed, while they primarily focus on avoiding training with CDM,
inevitably weakening the model's ability of understanding CDM in human
conversations and limiting further improvements in model performance. This
paper proposes a Sentence Semantic \textbf{Seg}mentation guided
\textbf{C}onditional \textbf{V}ariational \textbf{A}uto-\textbf{E}ncoder
(SegCVAE) method which can model and take advantages of the CDM data.
Specifically, to tackle the incoherent problem caused by one-to-many, SegCVAE
uses response-related prominent semantics to constrained the latent variable.
To mitigate the non-diverse problem brought by many-to-one, SegCVAE segments
multiple prominent semantics to enrich the latent variables. Three novel
components, Internal Separation, External Guidance, and Semantic Norms, are
proposed to achieve SegCVAE. On dialogue generation tasks, both the automatic
and human evaluation results show that SegCVAE achieves new state-of-the-art
performance.
Related papers
- A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - 'What are you referring to?' Evaluating the Ability of Multi-Modal
Dialogue Models to Process Clarificational Exchanges [65.03196674816772]
Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee.
Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarification Exchanges (CE): a Clarification Request (CR) and a response.
Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models.
arXiv Detail & Related papers (2023-07-28T13:44:33Z) - Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
and Language Model: A Comparative Study of Semantic Coding [57.42429912884543]
We propose Diff-LM-Speech, Tetra-Diff-Speech and Tri-Diff-Speech to solve high dimensionality and waveform distortion problems.
We also introduce a prompt encoder structure based on a variational autoencoder and a prosody bottleneck to improve prompt representation ability.
Experimental results show that our proposed methods outperform baseline methods.
arXiv Detail & Related papers (2023-07-28T11:20:23Z) - Evaluating Open-Domain Dialogues in Latent Space with Next Sentence
Prediction and Mutual Information [18.859159491548006]
We propose a novel learning-based automatic evaluation metric (CMN) for open-domain dialogues.
We employ Conditional Variational Autoencoders (CVAEs) with a Next Sentence Prediction (NSP) objective and employing Mutual Information (MI) to model the semantic similarity of text in the latent space.
Experimental results on two open-domain dialogue datasets demonstrate the superiority of our method compared with a wide range of baselines.
arXiv Detail & Related papers (2023-05-26T14:21:54Z) - Advanced Conditional Variational Autoencoders (A-CVAE): Towards
interpreting open-domain conversation generation via disentangling latent
feature representation [15.742077523458995]
This paper proposes to harness the generative model with a priori knowledge through a cognitive approach involving mesoscopic scale feature disentanglement.
We propose a new metric for open-domain dialogues, which can objectively evaluate the interpretability of the latent space distribution.
arXiv Detail & Related papers (2022-07-26T07:39:36Z) - Unsupervised Mismatch Localization in Cross-Modal Sequential Data [5.932046800902776]
We develop an unsupervised learning algorithm that can infer the relationship between content-mismatched cross-modal data.
We propose a hierarchical Bayesian deep learning model, named mismatch localization variational autoencoder (ML-VAE), that decomposes the generative process of the speech into hierarchically structured latent variables.
Our experimental results show that ML-VAE successfully locates the mismatch between text and speech, without the need for human annotations.
arXiv Detail & Related papers (2022-05-05T14:23:27Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - Multi-Domain Dialogue Acts and Response Co-Generation [34.27525685962274]
We propose a neural co-generation model that generates dialogue acts and responses concurrently.
Our model achieves very favorable improvement over several state-of-the-art models in both automatic and human evaluations.
arXiv Detail & Related papers (2020-04-26T12:21:17Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.