Compositional Demographic Word Embeddings
- URL: http://arxiv.org/abs/2010.02986v2
- Date: Thu, 29 Oct 2020 18:54:53 GMT
- Title: Compositional Demographic Word Embeddings
- Authors: Charles Welch, Jonathan K. Kummerfeld, Ver\'onica P\'erez-Rosas, Rada
Mihalcea
- Abstract summary: We propose a new form of personalized word embeddings that use demographic-specific word representations derived compositionally from full or partial demographic information for a user.
We show that the resulting demographic-aware word representations outperform generic word representations on two tasks for English: language modeling and word associations.
- Score: 41.89745054269992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word embeddings are usually derived from corpora containing text from many
individuals, thus leading to general purpose representations rather than
individually personalized representations. While personalized embeddings can be
useful to improve language model performance and other language processing
tasks, they can only be computed for people with a large amount of longitudinal
data, which is not the case for new users. We propose a new form of
personalized word embeddings that use demographic-specific word representations
derived compositionally from full or partial demographic information for a user
(i.e., gender, age, location, religion). We show that the resulting
demographic-aware word representations outperform generic word representations
on two tasks for English: language modeling and word associations. We further
explore the trade-off between the number of available attributes and their
relative effectiveness and discuss the ethical implications of using them.
Related papers
- Investigating Idiomaticity in Word Representations [9.208145117062339]
We focus on noun compounds of varying levels of idiomaticity in two languages (English and Portuguese)
We present a dataset of minimal pairs containing human idiomaticity judgments for each noun compound at both type and token levels.
We define a set of fine-grained metrics of Affinity and Scaled Similarity to determine how sensitive the models are to perturbations that may lead to changes in idiomaticity.
arXiv Detail & Related papers (2024-11-04T21:05:01Z) - Leverage Points in Modality Shifts: Comparing Language-only and
Multimodal Word Representations [0.8594140167290097]
Multimodal embeddings aim to enrich the semantic information in neural representations of language compared to text-only models.
Our paper compares word embeddings from three vision-and-language models and three text-only models, with static and contextual representations.
This is the first large-scale study of the effect of visual grounding on language representations, including 46 semantic parameters.
arXiv Detail & Related papers (2023-06-04T12:53:12Z) - Multilingual Conceptual Coverage in Text-to-Image Models [98.80343331645626]
"Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
arXiv Detail & Related papers (2023-06-02T17:59:09Z) - Interpretable Word Sense Representations via Definition Generation: The
Case of Semantic Change Analysis [3.515619810213763]
We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.
We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable.
arXiv Detail & Related papers (2023-05-19T20:36:21Z) - Logographic Information Aids Learning Better Representations for Natural
Language Inference [3.677231059555795]
We present a novel study which explores the benefits of providing language models with logographic information in learning better semantic representations.
Our evaluation results in six languages suggest significant benefits of using multi-modal embeddings in languages with logograhic systems.
arXiv Detail & Related papers (2022-11-03T20:40:14Z) - Visual Comparison of Language Model Adaptation [55.92129223662381]
adapters are lightweight alternatives for model adaptation.
In this paper, we discuss several design and alternatives for interactive, comparative visual explanation methods.
We show that, for instance, an adapter trained on the language debiasing task according to context-0 embeddings introduces a new type of bias.
arXiv Detail & Related papers (2022-08-17T09:25:28Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - Exploring the Value of Personalized Word Embeddings [41.89745054269992]
We show that a subset of words belonging to specific psycholinguistic categories tend to vary more in their representations across users.
We show that a language model using personalized word embeddings can be effectively used for authorship attribution.
arXiv Detail & Related papers (2020-11-11T20:23:09Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.