Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders
- URL: http://arxiv.org/abs/2012.07300v1
- Date: Mon, 14 Dec 2020 07:31:17 GMT
- Title: Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders
- Authors: Yicheng Zou, Jun Lin, Lujun Zhao, Yangyang Kang, Zhuoren Jiang,
Changlong Sun, Qi Zhang, Xuanjing Huang, Xiaozhong Liu
- Abstract summary: We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
- Score: 59.038157066874255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic chat summarization can help people quickly grasp important
information from numerous chat messages. Unlike conventional documents, chat
logs usually have fragmented and evolving topics. In addition, these logs
contain a quantity of elliptical and interrogative sentences, which make the
chat summarization highly context dependent. In this work, we propose a novel
unsupervised framework called RankAE to perform chat summarization without
employing manually labeled data. RankAE consists of a topic-oriented ranking
strategy that selects topic utterances according to centrality and diversity
simultaneously, as well as a denoising auto-encoder that is carefully designed
to generate succinct but context-informative summaries based on the selected
utterances. To evaluate the proposed method, we collect a large-scale dataset
of chat logs from a customer service environment and build an annotated set
only for model evaluation. Experimental results show that RankAE significantly
outperforms other unsupervised methods and is able to generate high-quality
summaries in terms of relevance and topic coverage.
Related papers
- Topic-Aware Encoding for Extractive Summarization [15.113768658584979]
We propose a topic-aware encoding for document summarization to deal with this issue.
A neural topic model is added in the neural-based sentence-level representation learning to adequately consider the central topic information.
The experimental results on three public datasets show that our model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2021-12-17T15:26:37Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Structure-Aware Abstractive Conversation Summarization via Discourse and
Action Graphs [22.58861442978803]
We propose to explicitly model the rich structures in conversations for more precise and accurate conversation summarization.
We incorporate discourse relations between utterances and action triples in utterances through structured graphs to better encode conversations.
Experiments show that our proposed models outperform state-of-the-art methods and generalize well in other domains.
arXiv Detail & Related papers (2021-04-16T23:04:52Z) - Multi-View Sequence-to-Sequence Models with Conversational Structure for
Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP.
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations.
Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data
and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents.
With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses.
Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.