Dialectograms: Machine Learning Differences between Discursive
Communities
- URL: http://arxiv.org/abs/2302.05657v1
- Date: Sat, 11 Feb 2023 11:32:08 GMT
- Title: Dialectograms: Machine Learning Differences between Discursive
Communities
- Authors: Thyge Enggaard (1), August Lohse (1), Morten Axel Pedersen (1 and 2),
Sune Lehmann (1 and 3) ((1) Copenhagen Center for Social Data Science,
University of Copenhagen, Denmark, (2) Department of Anthropology, University
of Copenhagen, Denmark, (3) DTU Compute, Technical University of Denmark,
Denmark)
- Abstract summary: We take a step towards leveraging the richness of the full embedding space by using word embeddings to map out how words are used differently.
We provide a new measure of the degree to which words are used differently that overcomes the tendency for existing measures to pick out low frequent or polysemous words.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word embeddings provide an unsupervised way to understand differences in word
usage between discursive communities. A number of recent papers have focused on
identifying words that are used differently by two or more communities. But
word embeddings are complex, high-dimensional spaces and a focus on identifying
differences only captures a fraction of their richness. Here, we take a step
towards leveraging the richness of the full embedding space, by using word
embeddings to map out how words are used differently. Specifically, we describe
the construction of dialectograms, an unsupervised way to visually explore the
characteristic ways in which each community use a focal word. Based on these
dialectograms, we provide a new measure of the degree to which words are used
differently that overcomes the tendency for existing measures to pick out low
frequent or polysemous words. We apply our methods to explore the discourses of
two US political subreddits and show how our methods identify stark affective
polarisation of politicians and political entities, differences in the
assessment of proper political action as well as disagreement about whether
certain issues require political intervention at all.
Related papers
- Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use [21.15400893251543]
Bridging Dictionary is an interactive tool designed to illuminate how words are perceived by people with different political views.
The Bridging Dictionary includes a static, printable document featuring 796 terms with summaries generated by a large language model.
Users can explore selected words, visualizing their frequency, sentiment, summaries, and examples across political divides.
arXiv Detail & Related papers (2024-07-12T19:44:40Z) - Moral consensus and divergence in partisan language use [0.0]
Polarization has increased substantially in political discourse, contributing to a widening partisan divide.
We analyzed large-scale, real-world language use in Reddit communities and in news outlets to uncover psychological dimensions along which partisan language is divided.
arXiv Detail & Related papers (2023-10-14T16:50:26Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Discovering Differences in the Representation of People using
Contextualized Semantic Axes [5.972927416266617]
We use contextualized semantic axes to characterize differences among instances of the same word type.
We show that references to women and the contexts around them have become more detestable over time.
arXiv Detail & Related papers (2022-10-21T18:02:19Z) - Detecting Political Biases of Named Entities and Hashtags on Twitter [28.02430167720734]
Ideological divisions in the United States have become increasingly prominent in daily communication.
By detecting political biases in a corpus of text, one can attempt to describe and discern the polarity of that text.
We propose the Polarity-aware Embedding Multi-task learning model.
arXiv Detail & Related papers (2022-09-16T18:00:13Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Simple, Interpretable and Stable Method for Detecting Words with Usage
Change across Corpora [54.757845511368814]
The problem of comparing two bodies of text and searching for words that differ in their usage arises often in digital humanities and computational social science.
This is commonly approached by training word embeddings on each corpus, aligning the vector spaces, and looking for words whose cosine distance in the aligned space is large.
We propose an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word.
arXiv Detail & Related papers (2021-12-28T23:46:00Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Cultural Cartography with Word Embeddings [0.0]
We show how word embeddings are commensurate with prevailing theories of meaning in sociology.
First, one can hold terms constant and measure how the embedding space moves around them.
Second, one can also hold the embedding space constant and see how documents or authors move relative to it.
arXiv Detail & Related papers (2020-07-09T01:58:28Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.