Constructing Vec-tionaries to Extract Message Features from Texts: A
Case Study of Moral Appeals
- URL: http://arxiv.org/abs/2312.05990v2
- Date: Sat, 9 Mar 2024 04:55:47 GMT
- Title: Constructing Vec-tionaries to Extract Message Features from Texts: A
Case Study of Moral Appeals
- Authors: Zening Duan, Anqi Shao, Yicheng Hu, Heysung Lee, Xining Liao, Yoo Ji
Suh, Jisoo Kim, Kai-Cheng Yang, Kaiping Chen, and Sijia Yang
- Abstract summary: We present an approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings.
A vec-tionary can produce additional metrics to capture the ambivalence of a message feature beyond its strength in texts.
- Score: 5.336592570916432
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While researchers often study message features like moral content in text,
such as party manifestos and social media, their quantification remains a
challenge. Conventional human coding struggles with scalability and intercoder
reliability. While dictionary-based methods are cost-effective and
computationally efficient, they often lack contextual sensitivity and are
limited by the vocabularies developed for the original applications. In this
paper, we present an approach to construct vec-tionary measurement tools that
boost validated dictionaries with word embeddings through nonlinear
optimization. By harnessing semantic relationships encoded by embeddings,
vec-tionaries improve the measurement of message features from text, especially
those in short format, by expanding the applicability of original vocabularies
to other contexts. Importantly, a vec-tionary can produce additional metrics to
capture the valence and ambivalence of a message feature beyond its strength in
texts. Using moral content in tweets as a case study, we illustrate the steps
to construct the moral foundations vec-tionary, showcasing its ability to
process texts missed by conventional dictionaries and word embedding methods
and to produce measurements better aligned with crowdsourced human assessments.
Furthermore, additional metrics from the vec-tionary unveiled unique insights
that facilitated predicting outcomes such as message retransmission.
Related papers
- Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Integrating Bidirectional Long Short-Term Memory with Subword Embedding
for Authorship Attribution [2.3429306644730854]
Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution.
The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50.
arXiv Detail & Related papers (2023-06-26T11:35:47Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Dictionary-Assisted Supervised Contrastive Learning [0.0]
We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries.
The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest.
DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications.
arXiv Detail & Related papers (2022-10-27T04:57:43Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Improving Text Generation Evaluation with Batch Centering and Tempered
Word Mover Distance [24.49032191669509]
We present two techniques for improving encoding representations for similarity metrics.
We show results over various BERT-backbone learned metrics and achieving state of the art correlation with human ratings on several benchmarks.
arXiv Detail & Related papers (2020-10-13T03:46:25Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.