Do Large GPT Models Discover Moral Dimensions in Language
Representations? A Topological Study Of Sentence Embeddings
- URL: http://arxiv.org/abs/2309.09397v1
- Date: Sun, 17 Sep 2023 23:38:39 GMT
- Title: Do Large GPT Models Discover Moral Dimensions in Language
Representations? A Topological Study Of Sentence Embeddings
- Authors: Stephen Fitz
- Abstract summary: We take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness.
We first compute a fairness metric, inspired by social literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility.
Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments.
- Score: 0.7416846035207727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Large Language Models are deployed within Artificial Intelligence systems,
that are increasingly integrated with human society, it becomes more important
than ever to study their internal structures. Higher level abilities of LLMs
such as GPT-3.5 emerge in large part due to informative language
representations they induce from raw text data during pre-training on trillions
of words. These embeddings exist in vector spaces of several thousand
dimensions, and their processing involves mapping between multiple vector
spaces, with total number of parameters on the order of trillions. Furthermore,
these language representations are induced by gradient optimization, resulting
in a black box system that is hard to interpret. In this paper, we take a look
at the topological structure of neuronal activity in the "brain" of Chat-GPT's
foundation language model, and analyze it with respect to a metric representing
the notion of fairness. We develop a novel approach to visualize GPT's moral
dimensions. We first compute a fairness metric, inspired by social psychology
literature, to identify factors that typically influence fairness assessments
in humans, such as legitimacy, need, and responsibility. Subsequently, we
summarize the manifold's shape using a lower-dimensional simplicial complex,
whose topology is derived from this metric. We color it with a heat map
associated with this fairness metric, producing human-readable visualizations
of the high-dimensional sentence manifold. Our results show that sentence
embeddings based on GPT-3.5 can be decomposed into two submanifolds
corresponding to fair and unfair moral judgments. This indicates that GPT-based
language models develop a moral dimension within their representation spaces
and induce an understanding of fairness during their training process.
Related papers
- Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - Hidden Holes: topological aspects of language models [1.1172147007388977]
We study the evolution of topological structure in GPT based large language models across depth and time during training.
We show that the latter exhibit more topological complexity, with a distinct pattern of changes common to all natural languages but absent from synthetically generated data.
arXiv Detail & Related papers (2024-06-09T14:25:09Z) - Heaps' Law in GPT-Neo Large Language Model Emulated Corpora [2.7234916145234713]
Heaps' law is an empirical relation in text analysis that predicts vocabulary growth as a function of corpus size.
This study focuses on the emulation of corpora using the suite of GPT-Neo large language models.
arXiv Detail & Related papers (2023-11-10T20:07:32Z) - Cabbage Sweeter than Cake? Analysing the Potential of Large Language
Models for Learning Conceptual Spaces [18.312837741635207]
We explore the potential of Large Language Models to learn conceptual spaces.
Our experiments show that LLMs can indeed be used for learning meaningful representations.
We also find that fine-tuned models of the BERT family are able to match or even outperform the largest GPT-3 model.
arXiv Detail & Related papers (2023-10-09T07:41:19Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Out of One, Many: Using Language Models to Simulate Human Samples [3.278541277919869]
We show that the "algorithmic bias" within one such tool -- the GPT-3 language model -- is both fine-grained and demographically correlated.
We create "silicon samples" by conditioning the model on thousands of socio-demographic backstories from real human participants.
arXiv Detail & Related papers (2022-09-14T19:53:32Z) - Schr\"odinger's Tree -- On Syntax and Neural Language Models [10.296219074343785]
Language models have emerged as NLP's workhorse, displaying increasingly fluent generation capabilities.
We observe a lack of clarity across numerous dimensions, which influences the hypotheses that researchers form.
We outline the implications of the different types of research questions exhibited in studies on syntax.
arXiv Detail & Related papers (2021-10-17T18:25:23Z) - Low-Dimensional Structure in the Space of Language Representations is
Reflected in Brain Responses [62.197912623223964]
We show a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings.
We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI.
This suggests that the embedding captures some part of the brain's natural language representation structure.
arXiv Detail & Related papers (2021-06-09T22:59:12Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.