Hierarchical Transformer Network for Utterance-level Emotion Recognition
- URL: http://arxiv.org/abs/2002.07551v1
- Date: Tue, 18 Feb 2020 13:44:49 GMT
- Title: Hierarchical Transformer Network for Utterance-level Emotion Recognition
- Authors: QingBiao Li (Beijing University of Posts and Telecommunications),
ChunHua Wu (Beijing University of Posts and Telecommunications), KangFeng
Zheng (Beijing University of Posts and Telecommunications) and Zhe Wang
(Beijing University of Posts and Telecommunications)
- Abstract summary: We address some challenges in utter-ance-level emotion recognition (ULER)
Unlike the traditional text classification problem, this task is supported by a limited number of datasets.
We use a pretrained language model bidirectional encoder representa-tions from transformers (BERT) as the lower-level transformer.
In addition, we add speaker embeddings to the model for the first time, which enables our model to capture the in-teraction between speakers.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While there have been significant advances in de-tecting emotions in text, in
the field of utter-ance-level emotion recognition (ULER), there are still many
problems to be solved. In this paper, we address some challenges in ULER in
dialog sys-tems. (1) The same utterance can deliver different emotions when it
is in different contexts or from different speakers. (2) Long-range contextual
in-formation is hard to effectively capture. (3) Unlike the traditional text
classification problem, this task is supported by a limited number of datasets,
among which most contain inadequate conversa-tions or speech. To address these
problems, we propose a hierarchical transformer framework (apart from the
description of other studies, the "transformer" in this paper usually refers to
the encoder part of the transformer) with a lower-level transformer to model
the word-level input and an upper-level transformer to capture the context of
utterance-level embeddings. We use a pretrained language model bidirectional
encoder representa-tions from transformers (BERT) as the lower-level
transformer, which is equivalent to introducing external data into the model
and solve the problem of data shortage to some extent. In addition, we add
speaker embeddings to the model for the first time, which enables our model to
capture the in-teraction between speakers. Experiments on three dialog emotion
datasets, Friends, EmotionPush, and EmoryNLP, demonstrate that our proposed
hierarchical transformer network models achieve 1.98%, 2.83%, and 3.94%
improvement, respec-tively, over the state-of-the-art methods on each dataset
in terms of macro-F1.
Related papers
- Emotion Detection with Transformers: A Comparative Study [0.0]
We train and evaluate several pre-trained transformer models, on the Emotion dataset using different variants of transformers.
Our analysis reveals that commonly applied techniques like removing punctuation and stop words can hinder model performance.
arXiv Detail & Related papers (2024-03-18T23:22:50Z) - Multilevel Transformer For Multimodal Emotion Recognition [6.0149102420697025]
We introduce a novel multi-granularity framework, which combines fine-grained representation with pre-trained utterance-level representation.
Inspired by Transformer TTS, we propose a multilevel transformer model to perform fine-grained multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-26T10:31:24Z) - Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling [69.31802246621963]
We propose a novel memory-augmented transformer that is compatible with existing pre-trained encoder-decoder models.
By incorporating a separate memory module alongside the pre-trained transformer, the model can effectively interchange information between the memory states and the current input context.
arXiv Detail & Related papers (2022-09-15T22:37:22Z) - Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding.
It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - Transformer over Pre-trained Transformer for Neural Text Segmentation
with Enhanced Topic Coherence [6.73258176462356]
It consists of two components: bottom-level sentence encoders using pre-trained transformers, and an upper-level transformer-based segmentation model based on the sentence embeddings.
Our experiments show that Transformer$2$ manages to surpass state-of-the-art text segmentation models in terms of a commonly-used semantic coherence measure.
arXiv Detail & Related papers (2021-10-14T05:26:39Z) - HiT: Hierarchical Transformer with Momentum Contrast for Video-Text
Retrieval [40.646628490887075]
We propose a novel approach named Hierarchical Transformer (HiT) for video-text retrieval.
HiT performs hierarchical cross-modal contrastive matching in feature-level and semantic-level to achieve multi-view and comprehensive retrieval results.
Inspired by MoCo, we propose Momentum Cross-modal Contrast for cross-modal learning to enable large-scale negative interactions on-the-fly.
arXiv Detail & Related papers (2021-03-28T04:52:25Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for
Span-based Question Answering [20.294478273161303]
We introduce a novel approach to transformers that learns hierarchical representations in multiparty dialogue.
Our approach is evaluated on the FriendsQA dataset and shows improvements of 3.8% and 1.4% over the two state-of-the-art transformer models BERT and RoBERTa.
arXiv Detail & Related papers (2020-04-07T17:36:33Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z) - Fixed Encoder Self-Attention Patterns in Transformer-Based Machine
Translation [73.11214377092121]
We propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns.
Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality.
arXiv Detail & Related papers (2020-02-24T13:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.