Topical Segmentation of Spoken Narratives: A Test Case on Holocaust
Survivor Testimonies
- URL: http://arxiv.org/abs/2210.13783v1
- Date: Tue, 25 Oct 2022 06:02:28 GMT
- Title: Topical Segmentation of Spoken Narratives: A Test Case on Holocaust
Survivor Testimonies
- Authors: Eitan Wagner, Renana Keydar, Amit Pinchevski, Omri Abend
- Abstract summary: We tackle the task of segmenting running (spoken) narratives.
As a test case, we address Holocaust survivor testimonies, given in English.
- Score: 18.80663780272046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of topical segmentation is well studied, but previous work has
mostly addressed it in the context of structured, well-defined segments, such
as segmentation into paragraphs, chapters, or segmenting text that originated
from multiple sources. We tackle the task of segmenting running (spoken)
narratives, which poses hitherto unaddressed challenges. As a test case, we
address Holocaust survivor testimonies, given in English. Other than the
importance of studying these testimonies for Holocaust research, we argue that
they provide an interesting test case for topical segmentation, due to their
unstructured surface level, relative abundance (tens of thousands of such
testimonies were collected), and the relatively confined domain that they
cover. We hypothesize that boundary points between segments correspond to low
mutual information between the sentences proceeding and following the boundary.
Based on this hypothesis, we explore a range of algorithmic approaches to the
task, building on previous work on segmentation that uses generative Bayesian
modeling and state-of-the-art neural machinery. Compared to manually annotated
references, we find that the developed approaches show considerable
improvements over previous work.
Related papers
- A Bottom-Up Approach to Class-Agnostic Image Segmentation [4.086366531569003]
We present a novel bottom-up formulation for addressing the class-agnostic segmentation problem.
We supervise our network directly on the projective sphere of its feature space.
Our bottom-up formulation exhibits exceptional generalization capability, even when trained on datasets designed for class-based segmentation.
arXiv Detail & Related papers (2024-09-20T17:56:02Z) - Image Segmentation in Foundation Model Era: A Survey [95.60054312319939]
Current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions.
This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation.
An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts.
arXiv Detail & Related papers (2024-08-23T10:07:59Z) - From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - Towards Open Vocabulary Learning: A Survey [146.90188069113213]
Deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.
This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2023-06-28T02:33:06Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Betrayed by Captions: Joint Caption Grounding and Generation for Open
Vocabulary Instance Segmentation [80.48979302400868]
We focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories.
Previous approaches have relied on massive caption datasets and complex pipelines to establish one-to-one mappings between image regions and captions in nouns.
We devise a joint textbfCaption Grounding and Generation (CGG) framework, which incorporates a novel grounding loss that only focuses on matching object to improve learning efficiency.
arXiv Detail & Related papers (2023-01-02T18:52:12Z) - Toward Unifying Text Segmentation and Long Document Summarization [31.084738269628748]
We study the role that section segmentation plays in extractive summarization of written and spoken documents.
Our approach learns robust sentence representations by performing summarization and segmentation simultaneously.
Our findings suggest that the model can not only achieve state-of-the-art performance on publicly available benchmarks, but demonstrate better cross-genre transferability.
arXiv Detail & Related papers (2022-10-28T22:07:10Z) - A Survey on Label-efficient Deep Segmentation: Bridging the Gap between
Weak Supervision and Dense Prediction [115.9169213834476]
This paper offers a comprehensive review on label-efficient segmentation methods.
We first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels.
Next, we summarize the existing label-efficient segmentation methods from a unified perspective.
arXiv Detail & Related papers (2022-07-04T06:21:01Z) - Neural Sequence Segmentation as Determining the Leftmost Segments [25.378188980430256]
We propose a novel framework that incrementally segments natural language sentences at segment level.
For every step in segmentation, it recognizes the leftmost segment of the remaining sequence.
We have conducted extensive experiments on syntactic chunking and Chinese part-of-speech tagging across 3 datasets.
arXiv Detail & Related papers (2021-04-15T03:35:03Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.