The Problem of Semantic Shift in Longitudinal Monitoring of Social
Media: A Case Study on Mental Health During the COVID-19 Pandemic
- URL: http://arxiv.org/abs/2206.11160v1
- Date: Wed, 22 Jun 2022 15:09:28 GMT
- Title: The Problem of Semantic Shift in Longitudinal Monitoring of Social
Media: A Case Study on Mental Health During the COVID-19 Pandemic
- Authors: Keith Harrigian and Mark Dredze
- Abstract summary: Social media allows researchers to track societal and cultural changes over time based on language analysis tools.
Many of these tools rely on statistical algorithms which need to be tuned to specific types of language.
Recent studies have shown the absence of appropriate tuning, specifically in the presence of semantic shift, can hinder robustness of the underlying methods.
- Score: 15.002282686061905
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social media allows researchers to track societal and cultural changes over
time based on language analysis tools. Many of these tools rely on statistical
algorithms which need to be tuned to specific types of language. Recent studies
have shown the absence of appropriate tuning, specifically in the presence of
semantic shift, can hinder robustness of the underlying methods. However,
little is known about the practical effect this sensitivity may have on
downstream longitudinal analyses. We explore this gap in the literature through
a timely case study: understanding shifts in depression during the course of
the COVID-19 pandemic. We find that inclusion of only a small number of
semantically-unstable features can promote significant changes in longitudinal
estimates of our target outcome. At the same time, we demonstrate that a
recently-introduced method for measuring semantic shift may be used to
proactively identify failure points of language-based models and, in turn,
improve predictive generalization.
Related papers
- A Systematic Analysis on the Temporal Generalization of Language Models in Social Media [12.035331011654078]
This paper focuses on temporal shifts in social media and, in particular, Twitter.
We propose a unified evaluation scheme to assess the performance of language models (LMs) under temporal shift.
arXiv Detail & Related papers (2024-05-15T05:41:06Z) - CausalGym: Benchmarking causal interpretability methods on linguistic
tasks [52.61917615039112]
We use CausalGym to benchmark the ability of interpretability methods to causally affect model behaviour.
We study the pythia models (14M--6.9B) and assess the causal efficacy of a wide range of interpretability methods.
We find that DAS outperforms the other methods, and so we use it to study the learning trajectory of two difficult linguistic phenomena.
arXiv Detail & Related papers (2024-02-19T21:35:56Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - A Simple and Flexible Modeling for Mental Disorder Detection by Learning
from Clinical Questionnaires [0.2580765958706853]
We propose a novel approach that captures the semantic meanings directly from the text and compares them to symptom-related descriptions.
Our detailed analysis shows that the proposed model is effective at leveraging domain knowledge, transferable to other mental disorders, and providing interpretable detection results.
arXiv Detail & Related papers (2023-06-05T15:23:55Z) - Contextualized language models for semantic change detection: lessons
learned [4.436724861363513]
We present a qualitative analysis of the outputs of contextualized embedding-based methods for detecting diachronic semantic change.
Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift.
Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance.
arXiv Detail & Related papers (2022-08-31T23:35:24Z) - Quantifying Cognitive Factors in Lexical Decline [2.4424095531386234]
We propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline.
We find that most of our proposed factors show a significant difference in the expected direction between each curated set of declining words and their matched stable words.
Further diachronic analysis reveals that declining words tend to decrease in the diversity of their lexical contexts over time, gradually narrowing their 'ecological niches'
arXiv Detail & Related papers (2021-10-12T07:12:56Z) - Combating Temporal Drift in Crisis with Adapted Embeddings [58.4558720264897]
Language usage changes over time, and this can impact the effectiveness of NLP systems.
This work investigates methods for adapting to changing discourse during crisis events.
arXiv Detail & Related papers (2021-04-17T13:11:41Z) - Semantic coordinates analysis reveals language changes in the AI field [19.878987032985634]
We propose a method based on semantic shifts that reveals changes in language within publications of a field.
We use GloVe-style probability ratios to quantify the shifting directions and extents from multiple viewpoints.
We show that semantic coordinates analysis can detect shifts echoing changes of research interests.
arXiv Detail & Related papers (2020-11-01T15:59:24Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [53.484562601127195]
We point out the inability to infer behavioral conclusions from probing results.
We offer an alternative method that focuses on how the information is being used, rather than on what information is encoded.
arXiv Detail & Related papers (2020-06-01T15:00:11Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.