Unsupervised Topic Segmentation of Meetings with BERT Embeddings
- URL: http://arxiv.org/abs/2106.12978v1
- Date: Thu, 24 Jun 2021 12:54:43 GMT
- Title: Unsupervised Topic Segmentation of Meetings with BERT Embeddings
- Authors: Alessandro Solbiati, Kevin Heffernan, Georgios Damaskinos, Shivani
Poddar, Shubham Modi, Jacques Cali
- Abstract summary: We show how previous unsupervised topic segmentation methods can be improved using pre-trained neural architectures.
We introduce an unsupervised approach based on BERT embeddings that achieves a 15.5% reduction in error rate over existing unsupervised approaches.
- Score: 57.91018542715725
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic segmentation of meetings is the task of dividing multi-person meeting
transcripts into topic blocks. Supervised approaches to the problem have proven
intractable due to the difficulties in collecting and accurately annotating
large datasets. In this paper we show how previous unsupervised topic
segmentation methods can be improved using pre-trained neural architectures. We
introduce an unsupervised approach based on BERT embeddings that achieves a
15.5% reduction in error rate over existing unsupervised approaches applied to
two popular datasets for meeting transcripts.
Related papers
- Embedding And Clustering Your Data Can Improve Contrastive Pretraining [0.0]
We explore extending training data stratification beyond source granularity by leveraging a pretrained text embedding model and the classic k-means clustering algorithm.
Experimentally, we observe a notable increase in NDCG@10 when pretraining a BERT-based text embedding model on query-passage pairs from the MSMARCO passage retrieval dataset.
arXiv Detail & Related papers (2024-07-26T17:36:40Z) - Causal Unsupervised Semantic Segmentation [60.178274138753174]
Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations.
We propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference.
arXiv Detail & Related papers (2023-10-11T10:54:44Z) - Topic-driven Distant Supervision Framework for Macro-level Discourse
Parsing [72.14449502499535]
The task of analyzing the internal rhetorical structure of texts is a challenging problem in natural language processing.
Despite the recent advances in neural models, the lack of large-scale, high-quality corpora for training remains a major obstacle.
Recent studies have attempted to overcome this limitation by using distant supervision.
arXiv Detail & Related papers (2023-05-23T07:13:51Z) - Reconstruct Before Summarize: An Efficient Two-Step Framework for
Condensing and Summarizing Meeting Transcripts [32.329723001930006]
We propose a two-step framework, Reconstruct before Summarize (RbS), for effective and efficient meeting summarization.
RbS first leverages a self-supervised paradigm to annotate essential contents by reconstructing the meeting transcripts.
Secondly, we propose a relative positional bucketing (RPB) algorithm to equip (conventional) summarization models to generate the summary.
arXiv Detail & Related papers (2023-05-13T19:54:46Z) - A Survey on Label-efficient Deep Segmentation: Bridging the Gap between
Weak Supervision and Dense Prediction [115.9169213834476]
This paper offers a comprehensive review on label-efficient segmentation methods.
We first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels.
Next, we summarize the existing label-efficient segmentation methods from a unified perspective.
arXiv Detail & Related papers (2022-07-04T06:21:01Z) - Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly
Supervised Semantic Segmentation [48.294903659573585]
In this paper, we propose to embed affinity learning of multi-stage approaches in a single-stage model.
A deep neural network is used to deliver comprehensive semantic information in the training phase.
Experiments are conducted on the PASCAL VOC 2012 dataset to evaluate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2021-08-03T07:48:33Z) - Bayesian Semi-supervised Crowdsourcing [71.20185379303479]
Crowdsourcing has emerged as a powerful paradigm for efficiently labeling large datasets and performing various learning tasks.
This work deals with semi-supervised crowdsourced classification, under two regimes of semi-supervision.
arXiv Detail & Related papers (2020-12-20T23:18:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.