Integrating topic modeling and word embedding to characterize violent
deaths
- URL: http://arxiv.org/abs/2106.14365v1
- Date: Mon, 28 Jun 2021 01:53:20 GMT
- Title: Integrating topic modeling and word embedding to characterize violent
deaths
- Authors: Alina Arseniev-Koehler, Susan D. Cochran, Vickie M. Mays, Kai-Wei
Chang, Jacob Gates Foster
- Abstract summary: We introduce a new method to identify topics in a corpus and represent documents as topic sequences.
We first identify a set of vectors ("discourse atoms") that provide a sparse representation of an embedding space.
We then compare the gender bias of topics to their prevalence in narratives of female versus male victims.
- Score: 25.95389494074192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is an escalating need for methods to identify latent patterns in text
data from many domains. We introduce a new method to identify topics in a
corpus and represent documents as topic sequences. Discourse Atom Topic
Modeling draws on advances in theoretical machine learning to integrate topic
modeling and word embedding, capitalizing on the distinct capabilities of each.
We first identify a set of vectors ("discourse atoms") that provide a sparse
representation of an embedding space. Atom vectors can be interpreted as latent
topics: Through a generative model, atoms map onto distributions over words;
one can also infer the topic that generated a sequence of words. We illustrate
our method with a prominent example of underutilized text: the U.S. National
Violent Death Reporting System (NVDRS). The NVDRS summarizes violent death
incidents with structured variables and unstructured narratives. We identify
225 latent topics in the narratives (e.g., preparation for death and physical
aggression); many of these topics are not captured by existing structured
variables. Motivated by known patterns in suicide and homicide by gender, and
recent research on gender biases in semantic space, we identify the gender bias
of our topics (e.g., a topic about pain medication is feminine). We then
compare the gender bias of topics to their prevalence in narratives of female
versus male victims. Results provide a detailed quantitative picture of
reporting about lethal violence and its gendered nature. Our method offers a
flexible and broadly applicable approach to model topics in text data.
Related papers
- Gender Bias in Instruction-Guided Speech Synthesis Models [55.2480439325792]
This study investigates the potential gender bias in how models interpret occupation-related prompts.
We explore whether these models exhibit tendencies to amplify gender stereotypes when interpreting such prompts.
Our experimental results reveal the model's tendency to exhibit gender bias for certain occupations.
arXiv Detail & Related papers (2025-02-08T17:38:24Z) - CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates.
We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z) - A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling [10.589126787499973]
Existing topic models often struggle to accurately capture the underlying semantic patterns of short texts.
This paper introduces a novel model-agnostic mechanism, termed Topic Refinement.
We show that Topic Refinement boosts the topic quality and improves the performance in topic-related text classification tasks.
arXiv Detail & Related papers (2024-03-26T13:50:34Z) - Conflicts, Villains, Resolutions: Towards models of Narrative Media
Framing [19.589945994234075]
We revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives.
We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions.
We explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches.
arXiv Detail & Related papers (2023-06-03T08:50:13Z) - InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling [40.54497836775837]
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics.
Most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries.
We propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM) to produce more coherent, diverse, and well-aligned topics.
arXiv Detail & Related papers (2023-04-07T08:49:43Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice.
Recent work has shown that deep learning models can successfully caption natural images.
We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.