Related papers: Transformer Based Bengali Chatbot Using General Knowledge Dataset

Transformer Based Bengali Chatbot Using General Knowledge Dataset

URL: http://arxiv.org/abs/2111.03937v2
Date: Tue, 9 Nov 2021 04:42:19 GMT
Title: Transformer Based Bengali Chatbot Using General Knowledge Dataset
Authors: Abu Kaisar Mohammad Masum, Sheikh Abujar, Sharmin Akter, Nushrat Jahan Ria, Syed Akhter Hossain
Abstract summary: In this research, we applied the transformer model for Bengali general knowledge chatbots based on the Bengali general knowledge Question Answer (QA) dataset. It scores 85.0 BLEU on the applied QA data. To check the comparison of the transformer model performance, we trained the seq2seq model with attention on our dataset that scores 23.5 BLEU.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An AI chatbot provides an impressive response after learning from the trained dataset. In this decade, most of the research work demonstrates that deep neural models superior to any other model. RNN model regularly used for determining the sequence-related problem like a question and it answers. This approach acquainted with everyone as seq2seq learning. In a seq2seq model mechanism, it has encoder and decoder. The encoder embedded any input sequence, and the decoder embedded output sequence. For reinforcing the seq2seq model performance, attention mechanism added into the encoder and decoder. After that, the transformer model has introduced itself as a high-performance model with multiple attention mechanism for solving the sequence-related dilemma. This model reduces training time compared with RNN based model and also achieved state-of-the-art performance for sequence transduction. In this research, we applied the transformer model for Bengali general knowledge chatbot based on the Bengali general knowledge Question Answer (QA) dataset. It scores 85.0 BLEU on the applied QA data. To check the comparison of the transformer model performance, we trained the seq2seq model with attention on our dataset that scores 23.5 BLEU.

Related papers

A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation [0.0]
We propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder. We show that the time complexity of the query-key dot product is reduced from $mathcalO(n2 d)$ in a classical model to $mathcalO(n2 d)$ in our quantum model. This work provides a promising avenue for quantum-enhanced natural language processing (NLP)
arXiv Detail & Related papers (2025-02-26T15:15:01Z)
The Mamba in the Llama: Distilling and Accelerating Hybrid Models [76.64055251296548]
We show how to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources. The resulting hybrid model achieves performance comparable to the original Transformer in chat benchmarks. We also introduce a hardware-aware speculative decoding algorithm that accelerates the inference speed of Mamba and hybrid models.
arXiv Detail & Related papers (2024-08-27T17:56:11Z)
Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models [16.49601740473416]
We explore recipes to improve training efficiency by initializing one model from the other. Using an encoder to warm-start seq2seq training, we show that we can match task performance of a from-scratch seq2seq model.
arXiv Detail & Related papers (2023-06-14T21:41:52Z)
A Conditional Generative Chatbot using Transformer Model [30.613612803419294]
In this paper, a novel architecture is proposed using conditional Wasserstein Generative Adrial Networks and a transformer model for answer generation. To the best of our knowledge, this is the first time that a generative is proposed using the embedded transformer in both generator and discriminator models. The results of the proposed model on the Cornell Movie-Dialog corpus and the Chit-Chat datasets confirm the superiority of the proposed model compared to state-of-the-art alternatives.
arXiv Detail & Related papers (2023-06-03T10:35:04Z)
Inflected Forms Are Redundant in Question Generation Models [27.49894653349779]
We propose an approach to enhance the performance of Question Generation using an encoder-decoder framework. Firstly, we identify the inflected forms of words from the input of encoder, and replace them with the root words. Secondly, we propose to adapt QG as a combination of the following actions in the encode-decoder framework: generating a question word, copying a word from the source sequence or generating a word transformation type.
arXiv Detail & Related papers (2023-01-01T13:08:11Z)
Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference [72.61732440246954]
Large pre-trained language models often lack logical consistency across test inputs. We propose a framework, ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models. We show that ConCoRD consistently boosts accuracy and consistency of off-the-shelf closed-book QA and VQA models.
arXiv Detail & Related papers (2022-11-21T21:58:30Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition [69.68154370877615]
The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step. To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT) The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.
arXiv Detail & Related papers (2021-04-04T02:34:55Z)
Last Query Transformer RNN for knowledge tracing [0.0]
This paper presents an efficient model to predict a student's answer correctness given his past learning activities. I achieved the 1st place in the 'Riiid! Answer Correctness Prediction' competition hosted on kaggle.
arXiv Detail & Related papers (2021-02-10T17:10:31Z)
A Neural Few-Shot Text Classification Reality Check [4.689945062721168]
Several neural few-shot classification models have emerged, yielding significant progress over time. In this paper, we compare all these models, first adapting those made in the field of image processing to NLP, and second providing them access to transformers. We then test these models equipped with the same transformer-based encoder on the intent detection task, known for having a large number of classes.
arXiv Detail & Related papers (2021-01-28T15:46:14Z)
Investigation of Sentiment Controllable Chatbot [50.34061353512263]
In this paper, we investigate four models to scale or adjust the sentiment of the response. The models are a persona-based model, reinforcement learning, a plug and play model, and CycleGAN. We develop machine-evaluated metrics to estimate whether the responses are reasonable given the input.
arXiv Detail & Related papers (2020-07-11T16:04:30Z)
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition. The proposed model can accurately predict the length of the target sequence and achieve a competitive performance. The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z)
Learning to Encode Position for Transformer with Continuous Dynamical Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.