Related papers: Sentence Level Curriculum Learning for Improved Neural Conversational Models

Sentence Level Curriculum Learning for Improved Neural Conversational Models

URL: http://arxiv.org/abs/2305.08818v1
Date: Mon, 15 May 2023 17:28:59 GMT
Title: Sentence Level Curriculum Learning for Improved Neural Conversational Models
Authors: Sean Paulsen
Abstract summary: We study how to design machine intelligence to converse with a human user. Our goal is to separate training into segments, with each segment's corpus comprised of longer sentence pairs. This will mimic the desired "buildup" component of human learning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Designing machine intelligence to converse with a human user necessarily requires an understanding of how humans participate in conversation, and thus conversation modeling is an important task in natural language processing. New breakthroughs in architecture and data gathering continue to push the performance of such conversational AI models. However, designs neglect the gradual buildup in sentence structure and complexity experienced by humans as we learn to communicate. During training, our model accepts one or more sentences as input and attempts to predict the next sentence in the conversation one word at a time, so our goal is to separate training into segments, with each segment's corpus comprised of longer sentence pairs than the previous one. This will mimic the desired "buildup" component of human learning. We begin with only "short" length sentence pairs, then only "medium" length pairs, and so on. A majority of our experiments were toward optimizing this technique, ensuring a proper representation of the technique's potential, since many of the details were new questions. Our segment-trained models were then able to achieve lower validation loss at the end of training than models trained with standard text preparation. This segmented training is straightforward to implement and our results provide a general direction for future research to implement and improve it.

Related papers

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora [84.03928547166873]
Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data and still do not perform as well as humans on many evaluations. The BabyLM Challenge is a communal effort in which participants compete to optimize language model training on a fixed data budget.
arXiv Detail & Related papers (2025-04-10T23:22:43Z)
Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data [3.1715756370116637]
We take inspiration from human cognitive development to train models in limited data conditions. Our approach offers a proof of concept for training a multimodal model using a developmentally plausible amount of data.
arXiv Detail & Related papers (2024-10-29T10:50:03Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Training Language Models with Natural Language Feedback [51.36137482891037]
We learn from language feedback on model outputs using a three-step learning algorithm. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization.
arXiv Detail & Related papers (2022-04-29T15:06:58Z)
TANet: Thread-Aware Pretraining for Abstractive Conversational Summarization [27.185068253347257]
We build a large-scale (11M) pretraining dataset called RCS based on the multi-person discussions in the Reddit community. We then present TANet, a thread-aware Transformer-based network. Unlike the existing pre-trained models that treat a conversation as a sequence of sentences, we argue that the inherent contextual dependency plays an essential role in understanding the entire conversation.
arXiv Detail & Related papers (2022-04-09T16:08:46Z)
A study on the efficacy of model pre-training in developing neural text-to-speech system [55.947807261757056]
This study aims to understand better why and how model pre-training can positively contribute to TTS system performance. It is found that the TTS system could achieve comparable performance when the pre-training data is reduced to 1/8 of its original size.
arXiv Detail & Related papers (2021-10-08T02:09:28Z)
CloneBot: Personalized Dialogue-Response Predictions [0.0]
The project task was to create a model that, given a speaker ID, chat history, and an utterance query, can predict the response utterance in a conversation. The model is personalized for each speaker. This task can be a useful tool for building speech bots that talk in a human-like manner in a live conversation.
arXiv Detail & Related papers (2021-03-31T01:15:37Z)
Token-wise Curriculum Learning for Neural Machine Translation [94.93133801641707]
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sufficient sampling amounts of "easy" samples from training data at the early training stage. We propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples. Our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages.
arXiv Detail & Related papers (2021-03-20T03:57:59Z)
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives [22.933794444266596]
We describe recent self-supervised and supervised contrastive NLP pretraining methods. We introduce key contrastive learning concepts with lessons learned from prior research and structure works by applications. We point to open challenges and future directions for contrastive NLP to encourage bringing contrastive NLP pretraining closer to recent successes in image representation pretraining.
arXiv Detail & Related papers (2021-02-25T16:35:07Z)
Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z)
Interpreting convolutional networks trained on textual data [0.0]
We train a convolutional model on textual data and analyze the global logic of the model by studying its filter values. We find the most important words in our corpus to our models logic and remove the rest. New models trained on just the 5% most important words can achieve the same performance as the original model while reducing training time by more than half.
arXiv Detail & Related papers (2020-10-20T20:12:05Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.