In response generation task, proper sentimental expressions can obviously
improve the human-like level of the responses. However, for real application in
online systems, high QPS (queries per second, an indicator of the flow capacity
of on-line systems) is required, and a dynamic vocabulary mechanism has been
proved available in improving speed of generative models. In this paper, we
proposed an emotion-controlled dialog response generation model based on the
dynamic vocabulary mechanism, and the experimental results show the benefit of
Abstract. In response generation task, proper sentimental expressions can obviously improve the human-like level of the responses.
However, for real application in online systems, high QPS (queries per second, an indicator of the ﬂow capacity of on-line systems) is required, and a dynamic vocabulary mechanism has been proved available in improving speed of generative models.
For expressing appropriate emotions in chatbot responses, we linguistically build emotional mappings between user questions and chatbot responses, and then generate emotional responses with an emotion-controlled text generation model.
However, utilizing response generation model in real online systems has two typical risks: the ﬁrst one is that the content of generated response is not entirely relevant to the user question, and the second one is that the running speed of response generation models should be further improved to meet the demand of real online systems.
In , a dynamic vocabulary seq2seq (DVS2S) model has been proposed and it can well solve above two risks, since it can especially eliminate abundant noise words from the generation vocabulary, which beneﬁt both the generation speed and the relevance between responses and questions.
In this paper, we try to realize a dynamic vocabulary based emotion-controlled response generation model, which aims to generate emotional responses with high quality and high speed for our chatbot, an industrial intelligent assistant designed for creating an innovative online shopping experience in E-commerce.
1) Emotion Mapping between Questions and Responses: We ﬁrst utilize LEAM (Label-Embedding Attentive Model)  model to realize an emotion classiﬁcation on user questions, and then we linguistically build emotional mappings between questions and responses, such as a mapping from an ‘abusing’ question to an ‘aggrieved’ or ‘regretful’ response.
2) Seq2Seq Model Training: For real application in online systems, with high QPS requirement, we just employ typical Bi-GRU (Gated Recurrent Unit) as encoder and GRU  as decoder, with an attention mechanism, instead of very complex models.
We take βI(c) as converter from (cid:104)h, e(cid:105) to P, where the h is the hidden state of the encoder, and the training task is to optimize βI(c).
βI(c) を (cid:104)h, e(cid:105) から P への変換子とし、h はエンコーダの隠れ状態であり、トレーニングタスクは βI(c) を最適化する。
4)Joint Fine-tune: We jointly ﬁne-tune the Seq2Seq model and the vocab-
4)joint fine-tune: we togetherly fine-tune the seq2seq model and the vocab-
ulary model to further optimize the emotional response generation loss.
Title Suppressed Due to Excessive Length
3 3 Experiments 1) Dataset collection & Implementation: We collect 132,118 frequently asked emotional user questions from the online log of our commercial chatbot, and manually labeled 1 to 3 corresponding emotional responses to each question.
2) Baselines: We considered the following baselines: 1) S2SA: a standard seq2seq model with an attention mechanism ; 2) TA-S2S: the topic-aware seq2seq model proposed in ; 3) CVAE: recent work for response generation with a conditional variational auto-encoder ; 4) DVS2S: the dynamic vocabulary seq2seq model which allows each input to possess their own vocabulary in decoding .
With the ﬁne-tune step, we compare 3 diﬀerent ways: 1) no ﬁnetune (NO-ft); 2) just ﬁne-tune the vocabulary model training step (ft-target); 3) ﬁne-tune both Seq2Seq and Vocabulary (ft-both).
ファインチューンステップでは、1)ファインチューン(NO-ft)、2)ボキャブラリモデルのトレーニングステップ(ft-target)、3)セック2セックとボキャブラリ(ft-both)の3つの異なる方法を比較します。 訳抜け防止モード: 微調整ステップでは、3つの異なる方法を比較します:1 ) no finetune (NO - ft ) ; 2 ) just fine - tune the vocabulary model training step (ft - target ) ; 3 ) fine - tune Seq2SeqとVocabularyの両方(ft - 両方)。
Besides, 3 embedding-based metrics  are used: Greedy, Average, and Extreme.
さらに,3つの組込みベースのメトリクス  – 欲望,平均,極端さです。
4 S. Song et al
4 S. Song et al
4) Experimental results: Table 1 gives the evaluation results on diﬀerent metrics, and we can see ft-target gets the best performance on both BLEU and si-QPS (si-QPS as 92 means about 10.87ms per query).
We qualitatively analyze DV-ERG with some examples from the test data.
Table 2 shows several emotional generation results with our models, we can see that most of the generative results are shorter than manually labeled results, this is a common problem of generative models, since short results are ‘safer’ than long sentences in the model training step.
This response is with no problem, but it is better when user get the generated response ‘Mm-hmm, my little cute ’ with ft-target DV-ERG.
このレスポンスには問題はないが、ユーザが‘Mm-hmm, my little cute ’とft-target DV-ERGを入力すればよい。
Another example: with a user question as ‘Sing a song’, the manually labeled response is ‘No need to know’, and this is just a so-so response.
別の例: ユーザの質問を‘sing a song’とすると、手動でラベル付けされた応答は‘no need to know’であり、これは単に“so-so response”である。
This time, the ft-target DV-ERG generate a response as ‘Sing what?’, and this is a more reasonable one.
今回は、ft-target DV-ERGが「Sing what?」というレスポンスを生成します。 訳抜け防止モード: 今回は ft - target DV - ERG が 'Sing What ?' という応答を生成します。 これはより合理的なことです
4 Conclusion In this paper, we proposed an emotion-controlled response generation model based on the dynamic vocabulary mechanism, which can be practically applied to online chat-bots, considering its experimental eﬃciency and eﬀectiveness.