Related papers: Abstractive Summarization of Spoken and Written Instructions with BERT

Abstractive Summarization of Spoken and Written Instructions with BERT

URL: http://arxiv.org/abs/2008.09676v3
Date: Wed, 26 Aug 2020 20:46:23 GMT
Title: Abstractive Summarization of Spoken and Written Instructions with BERT
Authors: Alexandra Savelieva, Bryan Au-Yeung, and Vasanth Ramani
Abstract summary: We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
Score: 66.14755043607776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Summarization of speech is a difficult problem due to the spontaneity of the flow, disfluencies, and other issues that are not usually encountered in written texts. Our work presents the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics, from gardening and cooking to software configuration and sports. In order to enrich the vocabulary, we use transfer learning and pretrain the model on a few large cross-domain datasets in both written and spoken English. We also do preprocessing of transcripts to restore sentence segmentation and punctuation in the output of an ASR system. The results are evaluated with ROUGE and Content-F1 scoring for the How2 and WikiHow datasets. We engage human judges to score a set of summaries randomly selected from a dataset curated from HowTo100M and YouTube. Based on blind evaluation, we achieve a level of textual fluency and utility close to that of summaries written by human content creators. The model beats current SOTA when applied to WikiHow articles that vary widely in style and topic, while showing no performance regression on the canonical CNN/DailyMail dataset. Due to the high generalizability of the model across different styles and domains, it has great potential to improve accessibility and discoverability of internet content. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.

Related papers

VidText: Towards Comprehensive Evaluation for Video Text Understanding [54.15328647518558]
VidText is a benchmark for comprehensive and in-depth evaluation of video text understanding.<n>It covers a wide range of real-world scenarios and supports multilingual content.<n>It introduces a hierarchical evaluation framework with video-level, clip-level, and instance-level tasks.
arXiv Detail & Related papers (2025-05-28T19:39:35Z)
A Cascaded Architecture for Extractive Summarization of Multimedia Content via Audio-to-Text Alignment [0.0]
This study presents a cascaded architecture for extractive summarization of multimedia content via audio-to-text alignment. It integrates audio-to-text conversion using Microsoft Azure Speech with advanced extractive summarization models, including Whisper, Pegasus, and Facebook BART XSum. Evaluation using ROUGE and F1 scores demonstrates that the cascaded architecture outperforms conventional summarization methods.
arXiv Detail & Related papers (2025-03-06T13:59:14Z)
Survey on Abstractive Text Summarization: Dataset, Models, and Metrics [0.8184895397419141]
Transformer models are distinguished by their attention mechanisms, pretraining on general knowledge, and fine-tuning for downstream tasks. This survey examines the state of the art in text summarization models, with a specific focus on the abstractive summarization approach.
arXiv Detail & Related papers (2024-12-22T21:18:40Z)
Personalized Video Summarization using Text-Based Queries and Conditional Modeling [3.4447129363520337]
This thesis explores enhancing video summarization by integrating text-based queries and conditional modeling. Evaluation metrics such as accuracy and F1-score assess the quality of the generated summaries.
arXiv Detail & Related papers (2024-08-27T02:43:40Z)
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge [12.034917651508524]
$texttWAVER$ is a cross-domain knowledge distillation framework via vision-language models. $texttWAVER$ capitalizes on the open-vocabulary properties that lie in pre-trained vision-language models. It can achieve state-of-the-art performance in text-video retrieval task while handling writing-style variations.
arXiv Detail & Related papers (2023-12-15T03:17:37Z)
On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z)
Towards End-to-end Speech-to-text Summarization [0.0]
Speech-to-text (S2T) summarization is a time-saving technique for filtering and keeping up with the broadcast news uploaded online on a daily basis. End-to-end (E2E) modelling of S2T abstractive summarization is a promising approach that offers the possibility of generating rich latent representations. We model S2T summarization both with a cascade and an E2E system for a corpus of broadcast news in French.
arXiv Detail & Related papers (2023-06-06T15:22:16Z)
Learning a Grammar Inducer from Massive Uncurated Instructional Videos [118.7279072358029]
Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text. We build a new model that can better learn video-span correlation without manually designed features. Our model yields higher F1 scores than the previous state-of-the-art systems trained on in-domain data.
arXiv Detail & Related papers (2022-10-22T00:22:55Z)
Leveraging Natural Supervision for Language Representation Learning and Generation [8.083109555490475]
We describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision. We first investigate self-supervised training losses to help enhance the performance of pretrained language models for various NLP tasks. We propose a framework that uses paraphrase pairs to disentangle semantics and syntax in sentence representations.
arXiv Detail & Related papers (2022-07-21T17:26:03Z)
CLIP-It! Language-Guided Video Summarization [96.69415453447166]
This work introduces CLIP-It, a single framework for addressing both generic and query-focused video summarization. We propose a language-guided multimodal transformer that learns to score frames in a video based on their importance relative to one another. Our model can be extended to the unsupervised setting by training without ground-truth supervision.
arXiv Detail & Related papers (2021-07-01T17:59:27Z)
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations. For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z)
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization. We also propose a prototype encoding structure to model the ability of intra-topic derivation. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.