MeetingBank: A Benchmark Dataset for Meeting Summarization
- URL: http://arxiv.org/abs/2305.17529v1
- Date: Sat, 27 May 2023 17:09:25 GMT
- Title: MeetingBank: A Benchmark Dataset for Meeting Summarization
- Authors: Yebowen Hu and Tim Ganter and Hanieh Deilamsalehy and Franck
Dernoncourt and Hassan Foroosh and Fei Liu
- Abstract summary: In this paper, we present MeetingBank, a new benchmark dataset of city council meetings over the past decade.
We make the collection, including meeting video links, transcripts, reference summaries, agenda, and other metadata, publicly available to facilitate the development of better meeting summarization techniques.
- Score: 37.761684754365945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the number of recorded meetings increases, it becomes increasingly
important to utilize summarization technology to create useful summaries of
these recordings. However, there is a crucial lack of annotated meeting corpora
for developing this technology, as it can be hard to collect meetings,
especially when the topics discussed are confidential. Furthermore, meeting
summaries written by experienced writers are scarce, making it hard for
abstractive summarizers to produce sensible output without a reliable
reference. This lack of annotated corpora has hindered the development of
meeting summarization technology. In this paper, we present MeetingBank, a new
benchmark dataset of city council meetings over the past decade. MeetingBank is
unique among other meeting corpora due to its divide-and-conquer approach,
which involves dividing professionally written meeting minutes into shorter
passages and aligning them with specific segments of the meeting. This breaks
down the process of summarizing a lengthy meeting into smaller, more manageable
tasks. The dataset provides a new testbed of various meeting summarization
systems and also allows the public to gain insight into how council decisions
are made. We make the collection, including meeting video links, transcripts,
reference summaries, agenda, and other metadata, publicly available to
facilitate the development of better meeting summarization techniques. Our
dataset can be accessed at: https://meetingbank.github.io
Related papers
- Investigating Consistency in Query-Based Meeting Summarization: A
Comparative Study of Different Embedding Methods [0.0]
Text Summarization is one of famous applications in Natural Language Processing (NLP) field.
It aims to automatically generate summary with important information based on a given context.
In this paper, we are inspired by "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization" proposed by Microsoft.
We also propose our Locater model designed to extract relevant spans based on given transcript and query, which are then summarized by Summarizer model.
arXiv Detail & Related papers (2024-02-10T08:25:30Z) - Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with
Weak Supervision on Sentence Classification [91.13086984529706]
Aspect-based meeting transcript summarization aims to produce multiple summaries.
Traditional summarization methods produce one summary mixing information of all aspects.
We propose a two-stage method for aspect-based meeting transcript summarization.
arXiv Detail & Related papers (2023-11-07T19:06:31Z) - Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system [30.35387091657807]
Large language models (LLMs) for dialog summarization have the potential to improve the experience of meetings.
Despite this potential, they face technological limitation due to long transcripts and inability to capture diverse recap needs based on user's context.
We develop a system to operationalize the representations with dialogue summarization as its building blocks.
arXiv Detail & Related papers (2023-07-28T20:25:11Z) - Meeting Summarization: A Survey of the State of the Art [0.0]
There is an overload of dialogue data due to the rise of virtual communication platforms.
The rise of Covid-19 has led people to rely on online communication platforms like Zoom, Slack, Microsoft Teams, Discord, etc. to conduct their company meetings.
There is a lack of comprehensive surveys in the field of meeting summarizers.
arXiv Detail & Related papers (2022-12-16T00:21:30Z) - Abstractive Meeting Summarization: A Survey [15.455647477995306]
A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts.
Recent advances in deep learning has significantly improved language generation systems, opening the door to improved forms of abstractive summarization.
We provide an overview of the challenges raised by the task of abstractive meeting summarization and of the data sets, models and evaluation metrics that have been used to tackle the problems.
arXiv Detail & Related papers (2022-08-08T14:04:38Z) - A Sliding-Window Approach to Automatic Creation of Meeting Minutes [66.39584679676817]
Meeting minutes record any subject matters discussed, decisions reached and actions taken at meetings.
We present a sliding window approach to automatic generation of meeting minutes.
It aims to tackle issues associated with the nature of spoken text, including lengthy transcripts and lack of document structure.
arXiv Detail & Related papers (2021-04-26T02:44:14Z) - QMSum: A New Benchmark for Query-based Multi-domain Meeting
Summarization [45.83402681068943]
QMSum consists of 1,808 query-summary pairs over 232 meetings in multiple domains.
We investigate a locate-then-summarize method and evaluate a set of strong summarization baselines on the task.
arXiv Detail & Related papers (2021-04-13T05:00:35Z) - How Domain Terminology Affects Meeting Summarization Performance [61.12624289478716]
We create gold-standard annotations for domain terminology on a sizable meeting corpus.
We analyze the performance of a meeting summarization system with and without jargon terms.
arXiv Detail & Related papers (2020-11-02T02:33:59Z) - A Hierarchical Network for Abstractive Meeting Summarization with
Cross-Domain Pretraining [52.11221075687124]
We propose a novel abstractive summary network that adapts to the meeting scenario.
We design a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers.
Our model outperforms previous approaches in both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-04-04T21:00:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.