Exploring the Value of Personalized Word Embeddings
- URL: http://arxiv.org/abs/2011.06057v1
- Date: Wed, 11 Nov 2020 20:23:09 GMT
- Title: Exploring the Value of Personalized Word Embeddings
- Authors: Charles Welch, Jonathan K. Kummerfeld, Ver\'onica P\'erez-Rosas, Rada
Mihalcea
- Abstract summary: We show that a subset of words belonging to specific psycholinguistic categories tend to vary more in their representations across users.
We show that a language model using personalized word embeddings can be effectively used for authorship attribution.
- Score: 41.89745054269992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce personalized word embeddings, and examine their
value for language modeling. We compare the performance of our proposed
prediction model when using personalized versus generic word representations,
and study how these representations can be leveraged for improved performance.
We provide insight into what types of words can be more accurately predicted
when building personalized models. Our results show that a subset of words
belonging to specific psycholinguistic categories tend to vary more in their
representations across users and that combining generic and personalized word
embeddings yields the best performance, with a 4.7% relative reduction in
perplexity. Additionally, we show that a language model using personalized word
embeddings can be effectively used for authorship attribution.
Related papers
- Investigating Idiomaticity in Word Representations [9.208145117062339]
We focus on noun compounds of varying levels of idiomaticity in two languages (English and Portuguese)
We present a dataset of minimal pairs containing human idiomaticity judgments for each noun compound at both type and token levels.
We define a set of fine-grained metrics of Affinity and Scaled Similarity to determine how sensitive the models are to perturbations that may lead to changes in idiomaticity.
arXiv Detail & Related papers (2024-11-04T21:05:01Z) - Towards Explainability in NLP: Analyzing and Calculating Word Saliency
through Word Properties [4.330880304715002]
We explore the relationships between the word saliency and the word properties.
We establish a mapping model, Seq2Saliency, from the words in a text sample and their properties to the saliency values.
The experimental evaluations are conducted to analyze the saliency of words with different properties.
arXiv Detail & Related papers (2022-07-17T06:02:48Z) - Learnable Visual Words for Interpretable Image Recognition [70.85686267987744]
We propose the Learnable Visual Words (LVW) to interpret the model prediction behaviors with two novel modules.
The semantic visual words learning relaxes the category-specific constraint, enabling the general visual words shared across different categories.
Our experiments on six visual benchmarks demonstrate the superior effectiveness of our proposed LVW in both accuracy and model interpretation.
arXiv Detail & Related papers (2022-05-22T03:24:45Z) - Unigram-Normalized Perplexity as a Language Model Performance Measure
with Different Vocabulary Sizes [4.477547027158141]
We propose a new metric that can be used to evaluate language model performance with different vocabulary sizes.
The proposed unigram-normalized Perplexity actually presents the performance improvement of the language models from that of simple unigram model.
arXiv Detail & Related papers (2020-11-26T10:39:03Z) - Are Some Words Worth More than Others? [3.5598388686985354]
We propose two new intrinsic evaluation measures within the framework of a simple word prediction task.
We evaluate several commonly-used large English language models using our proposed metrics.
arXiv Detail & Related papers (2020-10-12T23:12:11Z) - Compositional Demographic Word Embeddings [41.89745054269992]
We propose a new form of personalized word embeddings that use demographic-specific word representations derived compositionally from full or partial demographic information for a user.
We show that the resulting demographic-aware word representations outperform generic word representations on two tasks for English: language modeling and word associations.
arXiv Detail & Related papers (2020-10-06T19:23:46Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Multiplex Word Embeddings for Selectional Preference Acquisition [70.33531759861111]
We propose a multiplex word embedding model, which can be easily extended according to various relations among words.
Our model can effectively distinguish words with respect to different relations without introducing unnecessary sparseness.
arXiv Detail & Related papers (2020-01-09T04:47:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.