Achieving Semantic Consistency: Contextualized Word Representations for Political Text Analysis
- URL: http://arxiv.org/abs/2412.04505v2
- Date: Sun, 19 Jan 2025 06:54:00 GMT
- Title: Achieving Semantic Consistency: Contextualized Word Representations for Political Text Analysis
- Authors: Ruiyu Zhang, Lin Nie, Ce Zhao, Qingyang Chen,
- Abstract summary: This study compares Word2Vec and BERT to evaluate their performance in semantic representations across different timeframes.
The results indicate that BERT outperforms Word2Vec in maintaining semantic stability and still recognizes subtle semantic variations.
- Score: 0.9249657468385781
- License:
- Abstract: Accurately interpreting words is vital in political science text analysis; some tasks require assuming semantic stability, while others aim to trace semantic shifts. Traditional static embeddings, like Word2Vec effectively capture long-term semantic changes but often lack stability in short-term contexts due to embedding fluctuations caused by unbalanced training data. BERT, which features transformer-based architecture and contextual embeddings, offers greater semantic consistency, making it suitable for analyses in which stability is crucial. This study compares Word2Vec and BERT using 20 years of People's Daily articles to evaluate their performance in semantic representations across different timeframes. The results indicate that BERT outperforms Word2Vec in maintaining semantic stability and still recognizes subtle semantic variations. These findings support BERT's use in text analysis tasks that require stability, where semantic changes are not assumed, offering a more reliable foundation than static alternatives.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Unsupervised Semantic Variation Prediction using the Distribution of
Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications.
We argue that mean representations alone cannot accurately capture such semantic variations.
We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z) - Incorporating Dynamic Semantics into Pre-Trained Language Model for
Aspect-based Sentiment Analysis [67.41078214475341]
We propose Dynamic Re-weighting BERT (DR-BERT) to learn dynamic aspect-oriented semantics for ABSA.
Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence.
We then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA)
arXiv Detail & Related papers (2022-03-30T14:48:46Z) - HistBERT: A Pre-trained Language Model for Diachronic Lexical Semantic
Analysis [3.2851864672627618]
We present a pre-trained BERT-based language model, HistBERT, trained on the balanced Corpus of Historical American English.
We report promising results in word similarity and semantic shift analysis.
arXiv Detail & Related papers (2022-02-08T02:53:48Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Statistically significant detection of semantic shifts using contextual
word embeddings [7.439525715543974]
We propose an approach to estimate semantic shifts by combining contextual word embeddings with permutation-based statistical tests.
We demonstrate the performance of this approach in simulation, achieving consistently high precision by suppressing false positives.
We additionally analyzed real-world data from SemEval-2020 Task 1 and the Liverpool FC subreddit corpus.
arXiv Detail & Related papers (2021-04-08T13:58:54Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - Word Embeddings: Stability and Semantic Change [0.0]
We present an experimental study on the instability of the training process of three of the most influential embedding techniques of the last decade: word2vec, GloVe and fastText.
We propose a statistical model to describe the instability of embedding techniques and introduce a novel metric to measure the instability of the representation of an individual word.
arXiv Detail & Related papers (2020-07-23T16:03:50Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.