Bilevel Scheduled Sampling for Dialogue Generation
- URL: http://arxiv.org/abs/2309.01953v1
- Date: Tue, 5 Sep 2023 05:05:06 GMT
- Title: Bilevel Scheduled Sampling for Dialogue Generation
- Authors: Jiawen Liu and Kan Li
- Abstract summary: We propose a bilevel scheduled sampling model that takes the sentence-level information into account and incorporates it with word-level quality.
Experiments conducted on the DailyDialog and PersonaChat datasets demonstrate the effectiveness of our proposed methods.
- Score: 6.89978591161039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exposure bias poses a common challenge in numerous natural language
processing tasks, particularly in the dialog generation. In response to this
issue, researchers have devised various techniques, among which scheduled
sampling has proven to be an effective method for mitigating exposure bias.
However, the existing state-of-the-art scheduled sampling methods solely
consider the current sampling words' quality for threshold truncation sampling,
which overlooks the importance of sentence-level information and the method of
threshold truncation warrants further discussion. In this paper, we propose a
bilevel scheduled sampling model that takes the sentence-level information into
account and incorporates it with word-level quality. To enhance sampling
diversity and improve the model's adaptability, we propose a smooth function
that maps the combined result of sentence-level and word-level information to
an appropriate range, and employ probabilistic sampling based on the mapped
values instead of threshold truncation. Experiments conducted on the
DailyDialog and PersonaChat datasets demonstrate the effectiveness of our
proposed methods, which significantly alleviate the exposure bias problem and
outperform state-of-the-art scheduled sampling methods.
Related papers
- Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.
Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z) - Compress Guidance in Conditional Diffusion Sampling [16.671575782090045]
This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue.
We observe a significant improvement in image quality and diversity while also reducing the required guidance timesteps by nearly 40%.
arXiv Detail & Related papers (2024-08-20T21:02:54Z) - AdaSelection: Accelerating Deep Learning Training through Data
Subsampling [27.46630703428186]
We introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch.
Compared with industry-standard baselines, AdaSelection consistently displays superior performance.
arXiv Detail & Related papers (2023-06-19T07:01:28Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - AutoSampling: Search for Effective Data Sampling Schedules [118.20014773014671]
We propose an AutoSampling method to automatically learn sampling schedules for model training.
We apply our method to a variety of image classification tasks illustrating the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-05-28T09:39:41Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z) - An Effective Contextual Language Modeling Framework for Speech
Summarization with Augmented Features [13.97006782398121]
Bidirectional Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing tasks.
We explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition.
We validate the effectiveness of our proposed method on a benchmark dataset.
arXiv Detail & Related papers (2020-06-01T18:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.