Identifying Narrative Patterns and Outliers in Holocaust Testimonies Using Topic Modeling
- URL: http://arxiv.org/abs/2405.02650v1
- Date: Sat, 4 May 2024 12:29:00 GMT
- Title: Identifying Narrative Patterns and Outliers in Holocaust Testimonies Using Topic Modeling
- Authors: Maxim Ifergan, Renana Keydar, Omri Abend, Amit Pinchevski,
- Abstract summary: This paper uses advanced Natural Language Processing techniques to explore the USC Shoah Foundation Holocaust testimony corpus.
By treating testimonies as structured question-and-answer sections, we apply topic modeling to identify key themes.
- Score: 13.639727580099484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The vast collection of Holocaust survivor testimonies presents invaluable historical insights but poses challenges for manual analysis. This paper leverages advanced Natural Language Processing (NLP) techniques to explore the USC Shoah Foundation Holocaust testimony corpus. By treating testimonies as structured question-and-answer sections, we apply topic modeling to identify key themes. We experiment with BERTopic, which leverages recent advances in language modeling technology. We align testimony sections into fixed parts, revealing the evolution of topics across the corpus of testimonies. This highlights both a common narrative schema and divergences between subgroups based on age and gender. We introduce a novel method to identify testimonies within groups that exhibit atypical topic distributions resembling those of other groups. This study offers unique insights into the complex narratives of Holocaust survivors, demonstrating the power of NLP to illuminate historical discourse and identify potential deviations in survivor experiences.
Related papers
- Gender Bias in Instruction-Guided Speech Synthesis Models [55.2480439325792]
This study investigates the potential gender bias in how models interpret occupation-related prompts.
We explore whether these models exhibit tendencies to amplify gender stereotypes when interpreting such prompts.
Our experimental results reveal the model's tendency to exhibit gender bias for certain occupations.
arXiv Detail & Related papers (2025-02-08T17:38:24Z) - Computational Analysis of Character Development in Holocaust Testimonies [13.639727580099484]
This work presents a computational approach to analyze character development along the narrative timeline.
We consider transcripts of Holocaust survivor testimonies as a test case, each telling the story of an individual in first-person terms.
We focus on the survivor's religious trajectory, examining the evolution of their disposition toward religious belief and practice.
arXiv Detail & Related papers (2024-12-22T15:20:53Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Conflicts, Villains, Resolutions: Towards models of Narrative Media
Framing [19.589945994234075]
We revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives.
We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions.
We explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches.
arXiv Detail & Related papers (2023-06-03T08:50:13Z) - A Group-Specific Approach to NLP for Hate Speech Detection [2.538209532048867]
We propose a group-specific approach to NLP for online hate speech detection.
We analyze historical data about discrimination against a protected group to better predict spikes in hate speech against that group.
We demonstrate this approach through a case study on NLP for detection of antisemitic hate speech.
arXiv Detail & Related papers (2023-04-21T19:08:49Z) - Topical Segmentation of Spoken Narratives: A Test Case on Holocaust
Survivor Testimonies [18.80663780272046]
We tackle the task of segmenting running (spoken) narratives.
As a test case, we address Holocaust survivor testimonies, given in English.
arXiv Detail & Related papers (2022-10-25T06:02:28Z) - NECE: Narrative Event Chain Extraction Toolkit [64.89332212585404]
We introduce NECE, an open-access, document-level toolkit that automatically extracts and aligns narrative events in the temporal order of their occurrence.
We show the high quality of the NECE toolkit and demonstrate its downstream application in analyzing narrative bias regarding gender.
We also openly discuss the shortcomings of the current approach, and potential of leveraging generative models in future works.
arXiv Detail & Related papers (2022-08-17T04:30:58Z) - Integrating topic modeling and word embedding to characterize violent
deaths [25.95389494074192]
We introduce a new method to identify topics in a corpus and represent documents as topic sequences.
We first identify a set of vectors ("discourse atoms") that provide a sparse representation of an embedding space.
We then compare the gender bias of topics to their prevalence in narratives of female versus male victims.
arXiv Detail & Related papers (2021-06-28T01:53:20Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z) - Multi-View Sequence-to-Sequence Models with Conversational Structure for
Abstractive Dialogue Summarization [72.54873655114844]
Text summarization is one of the most challenging and interesting problems in NLP.
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations.
Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment.
arXiv Detail & Related papers (2020-10-04T20:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.