Bilevel Scheduled Sampling for Dialogue Generation
- URL: http://arxiv.org/abs/2309.01953v1
- Date: Tue, 5 Sep 2023 05:05:06 GMT
- Title: Bilevel Scheduled Sampling for Dialogue Generation
- Authors: Jiawen Liu and Kan Li
- Abstract summary: We propose a bilevel scheduled sampling model that takes the sentence-level information into account and incorporates it with word-level quality.
Experiments conducted on the DailyDialog and PersonaChat datasets demonstrate the effectiveness of our proposed methods.
- Score: 6.89978591161039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exposure bias poses a common challenge in numerous natural language
processing tasks, particularly in the dialog generation. In response to this
issue, researchers have devised various techniques, among which scheduled
sampling has proven to be an effective method for mitigating exposure bias.
However, the existing state-of-the-art scheduled sampling methods solely
consider the current sampling words' quality for threshold truncation sampling,
which overlooks the importance of sentence-level information and the method of
threshold truncation warrants further discussion. In this paper, we propose a
bilevel scheduled sampling model that takes the sentence-level information into
account and incorporates it with word-level quality. To enhance sampling
diversity and improve the model's adaptability, we propose a smooth function
that maps the combined result of sentence-level and word-level information to
an appropriate range, and employ probabilistic sampling based on the mapped
values instead of threshold truncation. Experiments conducted on the
DailyDialog and PersonaChat datasets demonstrate the effectiveness of our
proposed methods, which significantly alleviate the exposure bias problem and
outperform state-of-the-art scheduled sampling methods.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis
Distance [6.358196724648596]
A deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution.
Out-of-distribution detection is a new research area that has received little attention in the context of dialect classification.
We propose a simple yet effective unsupervised Mahalanobis distance feature-based method to detect out-of-distribution samples.
arXiv Detail & Related papers (2023-08-09T11:33:53Z) - AdaSelection: Accelerating Deep Learning Training through Data
Subsampling [27.46630703428186]
We introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch.
Compared with industry-standard baselines, AdaSelection consistently displays superior performance.
arXiv Detail & Related papers (2023-06-19T07:01:28Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - UBARv2: Towards Mitigating Exposure Bias in Task-Oriented Dialogs [28.051423938045843]
We propose session-level sampling which explicitly exposes the model to sampled generated content of dialog context during training.
We employ a dropout-based consistency regularization with the masking strategy R-Mask to further improve the robustness and performance of the model.
The proposed UBARv2 achieves state-of-the-art performance on the standardized evaluation benchmark MultiWOZ.
arXiv Detail & Related papers (2022-09-15T12:14:46Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - AutoSampling: Search for Effective Data Sampling Schedules [118.20014773014671]
We propose an AutoSampling method to automatically learn sampling schedules for model training.
We apply our method to a variety of image classification tasks illustrating the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-05-28T09:39:41Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z) - An Effective Contextual Language Modeling Framework for Speech
Summarization with Augmented Features [13.97006782398121]
Bidirectional Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing tasks.
We explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition.
We validate the effectiveness of our proposed method on a benchmark dataset.
arXiv Detail & Related papers (2020-06-01T18:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.