Counteracts: Testing Stereotypical Representation in Pre-trained
Language Models
- URL: http://arxiv.org/abs/2301.04347v3
- Date: Fri, 7 Apr 2023 17:12:23 GMT
- Title: Counteracts: Testing Stereotypical Representation in Pre-trained
Language Models
- Authors: Damin Zhang, Julia Rayz, Romila Pradhan
- Abstract summary: We use counterexamples to examine the internal stereotypical knowledge in pre-trained language models (PLMs)
We evaluate 7 PLMs on 9 types of cloze-style prompt with different information and base knowledge.
- Score: 4.211128681972148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, language models have demonstrated strong performance on various
natural language understanding tasks. Language models trained on large
human-generated corpus encode not only a significant amount of human knowledge,
but also the human stereotype. As more and more downstream tasks have
integrated language models as part of the pipeline, it is necessary to
understand the internal stereotypical representation in order to design the
methods for mitigating the negative effects. In this paper, we use
counterexamples to examine the internal stereotypical knowledge in pre-trained
language models (PLMs) that can lead to stereotypical preference. We mainly
focus on gender stereotypes, but the method can be extended to other types of
stereotype. We evaluate 7 PLMs on 9 types of cloze-style prompt with different
information and base knowledge. The results indicate that PLMs show a certain
amount of robustness against unrelated information and preference of shallow
linguistic cues, such as word position and syntactic structure, but a lack of
interpreting information by meaning. Such findings shed light on how to
interact with PLMs in a neutral approach for both finetuning and evaluation.
Related papers
- Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach [4.908389661988191]
This paper introduces the Multi-Grain Stereotype (MGS) dataset, consisting of 51,867 instances across gender, race, profession, religion, and other stereotypes.
We evaluate various machine learning approaches to establish baselines and fine-tune language models of different architectures and sizes.
We employ explainable AI (XAI) tools, including SHAP, LIME, and BertViz, to assess whether the model's learned patterns align with human intuitions about stereotypes.
arXiv Detail & Related papers (2024-04-02T09:31:32Z) - Multilingual large language models leak human stereotypes across language boundaries [25.903732543380528]
We study how training a model multilingually may lead to stereotypes expressed in one language showing up in the models' behaviour in another.
We propose a measurement framework for stereotype leakage and investigate its effect across English, Russian, Chinese, and Hindi.
We find that GPT-3.5 exhibits the most stereotype leakage, and Hindi is the most susceptible to leakage effects.
arXiv Detail & Related papers (2023-12-12T10:24:17Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Roles of Scaling and Instruction Tuning in Language Perception: Model
vs. Human Attention [58.817405319722596]
This work compares the self-attention of several large language models (LLMs) in different sizes to assess the effect of scaling and instruction tuning on language perception.
Results show that scaling enhances the human resemblance and improves the effective attention by reducing the trivial pattern reliance, while instruction tuning does not.
We also find that current LLMs are consistently closer to non-native than native speakers in attention, suggesting a sub-optimal language perception of all models.
arXiv Detail & Related papers (2023-10-29T17:16:40Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - The Birth of Bias: A case study on the evolution of gender bias in an
English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus.
We find that the representation of gender is dynamic and identify different phases during training.
We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Probing Linguistic Information For Logical Inference In Pre-trained
Language Models [2.4366811507669124]
We propose a methodology for probing linguistic information for logical inference in pre-trained language model representations.
We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded.
We have demonstrated language models' potential as semantic and background knowledge bases for supporting symbolic inference methods.
arXiv Detail & Related papers (2021-12-03T07:19:42Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Stepmothers are mean and academics are pretentious: What do pretrained
language models learn about you? [11.107926166222452]
We present the first dataset comprising stereotypical attributes of a range of social groups.
We propose a method to elicit stereotypes encoded by pretrained language models in an unsupervised fashion.
arXiv Detail & Related papers (2021-09-21T09:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.