Generalized Word Shift Graphs: A Method for Visualizing and Explaining
Pairwise Comparisons Between Texts
- URL: http://arxiv.org/abs/2008.02250v1
- Date: Wed, 5 Aug 2020 17:27:11 GMT
- Title: Generalized Word Shift Graphs: A Method for Visualizing and Explaining
Pairwise Comparisons Between Texts
- Authors: Ryan J. Gallagher, Morgan R. Frank, Lewis Mitchell, Aaron J. Schwartz,
Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds
- Abstract summary: A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content.
We introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts.
We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback-Leibler and Jensen-Shannon divergences.
- Score: 0.15833270109954134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common task in computational text analyses is to quantify how two corpora
differ according to a measurement like word frequency, sentiment, or
information content. However, collapsing the texts' rich stories into a single
number is often conceptually perilous, and it is difficult to confidently
interpret interesting or unexpected textual patterns without looming concerns
about data artifacts or measurement validity. To better capture fine-grained
differences between texts, we introduce generalized word shift graphs,
visualizations which yield a meaningful and interpretable summary of how
individual words contribute to the variation between two texts for any measure
that can be formulated as a weighted average. We show that this framework
naturally encompasses many of the most commonly used approaches for comparing
texts, including relative frequencies, dictionary scores, and entropy-based
measures like the Kullback-Leibler and Jensen-Shannon divergences. Through
several case studies, we demonstrate how generalized word shift graphs can be
flexibly applied across domains for diagnostic investigation, hypothesis
generation, and substantive interpretation. By providing a detailed lens into
textual shifts between corpora, generalized word shift graphs help
computational social scientists, digital humanists, and other text analysis
practitioners fashion more robust scientific narratives.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks.
For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs.
Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z) - Dictionary-Assisted Supervised Contrastive Learning [0.0]
We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries.
The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest.
DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications.
arXiv Detail & Related papers (2022-10-27T04:57:43Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - Towards a Theoretical Understanding of Word and Relation Representation [8.020742121274418]
Representing words by vectors, or embeddings, enables computational reasoning.
We focus on word embeddings learned from text corpora and knowledge graphs.
arXiv Detail & Related papers (2022-02-01T15:34:58Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Extractive approach for text summarisation using graphs [0.0]
Our paper explores different graph-related algorithms that can be used in solving the text summarization problem using an extractive approach.
We consider two metrics: sentence overlap and edit distance for measuring sentence similarity.
arXiv Detail & Related papers (2021-06-21T10:03:34Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Heaps' law and Heaps functions in tagged texts: Evidences of their
linguistic relevance [0.0]
We study the relationship between vocabulary size and text length in a corpus of $75$ literary works in English.
We analyze the progressive appearance of new words of each tag along each individual text.
arXiv Detail & Related papers (2020-01-07T17:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.