Last Query Transformer RNN for knowledge tracing
- URL: http://arxiv.org/abs/2102.05038v1
- Date: Wed, 10 Feb 2021 17:10:31 GMT
- Title: Last Query Transformer RNN for knowledge tracing
- Authors: SeungKee Jeon
- Abstract summary: This paper presents an efficient model to predict a student's answer correctness given his past learning activities.
I achieved the 1st place in the 'Riiid! Answer Correctness Prediction' competition hosted on kaggle.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an efficient model to predict a student's answer
correctness given his past learning activities. Basically, I use both
transformer encoder and RNN to deal with time series input. The novel point of
the model is that it only uses the last input as query in transformer encoder,
instead of all sequence, which makes QK matrix multiplication in transformer
Encoder to have O(L) time complexity, instead of O(L^2). It allows the model to
input longer sequence. Using this model I achieved the 1st place in the 'Riiid!
Answer Correctness Prediction' competition hosted on kaggle.
Related papers
- Breaking the Attention Bottleneck [0.0]
This paper develops a generative function as attention or activation replacement.
It still has the auto-regressive character by comparing each token with the previous one.
The concept of attention replacement is distributed under the AGPL v3 license at https://gitlab.com/Bachstelzecausal_generation.
arXiv Detail & Related papers (2024-06-16T12:06:58Z) - Attention as an RNN [66.5420926480473]
We show that attention can be viewed as a special Recurrent Neural Network (RNN) with the ability to compute its textitmany-to-one RNN output efficiently.
We introduce a new efficient method of computing attention's textitmany-to-many RNN output based on the parallel prefix scan algorithm.
We show Aarens achieve comparable performance to Transformers on $38$ datasets spread across four popular sequential problem settings.
arXiv Detail & Related papers (2024-05-22T19:45:01Z) - How do Transformers perform In-Context Autoregressive Learning? [76.18489638049545]
We train a Transformer model on a simple next token prediction task.
We show how a trained Transformer predicts the next token by first learning $W$ in-context, then applying a prediction mapping.
arXiv Detail & Related papers (2024-02-08T16:24:44Z) - Unlimiformer: Long-Range Transformers with Unlimited Length Input [67.04942180004805]
Unlimiformer is a general approach that wraps any existing pretrained encoder-decoder transformer.
It offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index.
We show that Unlimiformer can process even 500k token-long inputs from the BookSum dataset, without any input truncation at test time.
arXiv Detail & Related papers (2023-05-02T17:35:08Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Transformer Based Bengali Chatbot Using General Knowledge Dataset [0.0]
In this research, we applied the transformer model for Bengali general knowledge chatbots based on the Bengali general knowledge Question Answer (QA) dataset.
It scores 85.0 BLEU on the applied QA data. To check the comparison of the transformer model performance, we trained the seq2seq model with attention on our dataset that scores 23.5 BLEU.
arXiv Detail & Related papers (2021-11-06T18:33:20Z) - Iterative Decoding for Compositional Generalization in Transformers [5.269770493488338]
In sequence-to-sequence learning, transformers are often unable to predict correct outputs for even marginally longer examples.
This paper introduces iterative decoding, an alternative to seq2seq learning.
We show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset.
arXiv Detail & Related papers (2021-10-08T14:52:25Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - Transformer Based Deliberation for Two-Pass Speech Recognition [46.86118010771703]
Speech recognition systems must generate words quickly while also producing accurate results.
Two-pass models excel at these requirements by employing a first-pass decoder that quickly emits words, and a second-pass decoder that requires more context but is more accurate.
Previous work has established that a deliberation network can be an effective second-pass model.
arXiv Detail & Related papers (2021-01-27T18:05:22Z) - Transformers are RNNs: Fast Autoregressive Transformers with Linear
Attention [22.228028613802174]
Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, they are prohibitively slow for very long sequences.
We make use of the associativity property of matrix products to reduce the complexity from $mathcalOleft(N2right)$ to $mathcalOleft(Nright)$, where $N$ is the sequence length.
Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences.
arXiv Detail & Related papers (2020-06-29T17:55:38Z) - Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.