Transformer Based Bengali Chatbot Using General Knowledge Dataset
- URL: http://arxiv.org/abs/2111.03937v2
- Date: Tue, 9 Nov 2021 04:42:19 GMT
- Title: Transformer Based Bengali Chatbot Using General Knowledge Dataset
- Authors: Abu Kaisar Mohammad Masum, Sheikh Abujar, Sharmin Akter, Nushrat Jahan
Ria, Syed Akhter Hossain
- Abstract summary: In this research, we applied the transformer model for Bengali general knowledge chatbots based on the Bengali general knowledge Question Answer (QA) dataset.
It scores 85.0 BLEU on the applied QA data. To check the comparison of the transformer model performance, we trained the seq2seq model with attention on our dataset that scores 23.5 BLEU.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An AI chatbot provides an impressive response after learning from the trained
dataset. In this decade, most of the research work demonstrates that deep
neural models superior to any other model. RNN model regularly used for
determining the sequence-related problem like a question and it answers. This
approach acquainted with everyone as seq2seq learning. In a seq2seq model
mechanism, it has encoder and decoder. The encoder embedded any input sequence,
and the decoder embedded output sequence. For reinforcing the seq2seq model
performance, attention mechanism added into the encoder and decoder. After
that, the transformer model has introduced itself as a high-performance model
with multiple attention mechanism for solving the sequence-related dilemma.
This model reduces training time compared with RNN based model and also
achieved state-of-the-art performance for sequence transduction. In this
research, we applied the transformer model for Bengali general knowledge
chatbot based on the Bengali general knowledge Question Answer (QA) dataset. It
scores 85.0 BLEU on the applied QA data. To check the comparison of the
transformer model performance, we trained the seq2seq model with attention on
our dataset that scores 23.5 BLEU.
Related papers
- Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq
Models [16.49601740473416]
We explore recipes to improve training efficiency by initializing one model from the other.
Using an encoder to warm-start seq2seq training, we show that we can match task performance of a from-scratch seq2seq model.
arXiv Detail & Related papers (2023-06-14T21:41:52Z) - A Conditional Generative Chatbot using Transformer Model [30.613612803419294]
In this paper, a novel architecture is proposed using conditional Wasserstein Generative Adrial Networks and a transformer model for answer generation.
To the best of our knowledge, this is the first time that a generative is proposed using the embedded transformer in both generator and discriminator models.
The results of the proposed model on the Cornell Movie-Dialog corpus and the Chit-Chat datasets confirm the superiority of the proposed model compared to state-of-the-art alternatives.
arXiv Detail & Related papers (2023-06-03T10:35:04Z) - Inflected Forms Are Redundant in Question Generation Models [27.49894653349779]
We propose an approach to enhance the performance of Question Generation using an encoder-decoder framework.
Firstly, we identify the inflected forms of words from the input of encoder, and replace them with the root words.
Secondly, we propose to adapt QG as a combination of the following actions in the encode-decoder framework: generating a question word, copying a word from the source sequence or generating a word transformation type.
arXiv Detail & Related papers (2023-01-01T13:08:11Z) - Enhancing Self-Consistency and Performance of Pre-Trained Language
Models through Natural Language Inference [72.61732440246954]
Large pre-trained language models often lack logical consistency across test inputs.
We propose a framework, ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models.
We show that ConCoRD consistently boosts accuracy and consistency of off-the-shelf closed-book QA and VQA models.
arXiv Detail & Related papers (2022-11-21T21:58:30Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech
Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step.
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT)
The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z) - Last Query Transformer RNN for knowledge tracing [0.0]
This paper presents an efficient model to predict a student's answer correctness given his past learning activities.
I achieved the 1st place in the 'Riiid! Answer Correctness Prediction' competition hosted on kaggle.
arXiv Detail & Related papers (2021-02-10T17:10:31Z) - A Neural Few-Shot Text Classification Reality Check [4.689945062721168]
Several neural few-shot classification models have emerged, yielding significant progress over time.
In this paper, we compare all these models, first adapting those made in the field of image processing to NLP, and second providing them access to transformers.
We then test these models equipped with the same transformer-based encoder on the intent detection task, known for having a large number of classes.
arXiv Detail & Related papers (2021-01-28T15:46:14Z) - Investigation of Sentiment Controllable Chatbot [50.34061353512263]
In this paper, we investigate four models to scale or adjust the sentiment of the response.
The models are a persona-based model, reinforcement learning, a plug and play model, and CycleGAN.
We develop machine-evaluated metrics to estimate whether the responses are reasonable given the input.
arXiv Detail & Related papers (2020-07-11T16:04:30Z) - Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z) - Learning to Encode Position for Transformer with Continuous Dynamical
Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models.
We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.