Deep Lexical Hypothesis: Identifying personality structure in natural
language
- URL: http://arxiv.org/abs/2203.02092v1
- Date: Fri, 4 Mar 2022 02:06:10 GMT
- Title: Deep Lexical Hypothesis: Identifying personality structure in natural
language
- Authors: Andrew Cutler, David M. Condon
- Abstract summary: We introduce a method to extract adjective similarities from language models.
The correlational structure produced through this method is highly similar to that of self- and other-ratings of 435 terms reported by Saucier and Goldberg.
Notably, Neuroticism and Openness are only weakly and inconsistently recovered.
- Score: 0.30458514384586394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in natural language processing (NLP) have produced general
models that can perform complex tasks such as summarizing long passages and
translating across languages. Here, we introduce a method to extract adjective
similarities from language models as done with survey-based ratings in
traditional psycholexical studies but using millions of times more text in a
natural setting. The correlational structure produced through this method is
highly similar to that of self- and other-ratings of 435 terms reported by
Saucier and Goldberg (1996a). The first three unrotated factors produced using
NLP are congruent with those in survey data, with coefficients of 0.89, 0.79,
and 0.79. This structure is robust to many modeling decisions: adjective set,
including those with 1,710 terms (Goldberg, 1982) and 18,000 terms (Allport &
Odbert, 1936); the query used to extract correlations; and language model.
Notably, Neuroticism and Openness are only weakly and inconsistently recovered.
This is a new source of signal that is closer to the original (semantic) vision
of the Lexical Hypothesis. The method can be applied where surveys cannot: in
dozens of languages simultaneously, with tens of thousands of items, on
historical text, and at extremely large scale for little cost. The code is made
public to facilitate reproduction and fast iteration in new directions of
research.
Related papers
- Probabilistic Method of Measuring Linguistic Productivity [0.0]
I propose a new way of measuring linguistic productivity that objectively assesses the ability of an affix to be used to coin new complex words.
token frequency does not dominate the productivity measure but naturally influences the sampling of bases.
A corpus-based approach and randomised design assure that true neologisms and words coined long ago have equal chances to be selected.
arXiv Detail & Related papers (2023-08-24T08:36:28Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Idiomatic Expression Identification using Semantic Compatibility [8.355785779504869]
We study the task of detecting whether a sentence has an idiomatic expression and localizing it.
We propose a multi-stage neural architecture with the attention flow mechanism for identifying these expressions.
A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines.
arXiv Detail & Related papers (2021-10-19T15:44:28Z) - Linguistically inspired morphological inflection with a sequence to
sequence model [19.892441884896893]
Our research question is whether a neural network would be capable of learning inflectional morphemes for inflection production.
We are using an inflectional corpus and a single layer seq2seq model to test this hypothesis.
Our character-morpheme-based model creates inflection by predicting the stem character-to-character and the inflectional affixes as character blocks.
arXiv Detail & Related papers (2020-09-04T08:58:42Z) - Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.
We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z) - A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP.
Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output.
One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.