Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing
- URL: http://arxiv.org/abs/2501.08276v1
- Date: Tue, 14 Jan 2025 17:50:06 GMT
- Title: Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing
- Authors: Pulkit Arora, Akbar Karimi, Lucie Flek,
- Abstract summary: We take a broader approach to explore a wider range of variations across sociodemographic dimensions.
We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic styles.
We find that demographic-specific paraphrasing significantly impacts the performance of language models.
- Score: 7.312170216336085
- License:
- Abstract: Large Language Models (LLMs) have shown impressive performance in various NLP tasks. However, there are concerns about their reliability in different domains of linguistic variations. Many works have proposed robustness evaluation measures for local adversarial attacks, but we need globally robust models unbiased to different language styles. We take a broader approach to explore a wider range of variations across sociodemographic dimensions to perform structured reliability tests on the reasoning capacity of language models. We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic styles. The assessment aims to provide a deeper understanding of LLMs in (a) their capability of generating demographic paraphrases with engineered prompts and (b) their reasoning capabilities in real-world, complex language scenarios. We also explore measures such as perplexity, explainability, and ATOMIC performance of paraphrases for fine-grained reliability analysis of LLMs on these sets. We find that demographic-specific paraphrasing significantly impacts the performance of language models, indicating that the subtleties of language variations remain a significant challenge. The code and dataset will be made available for reproducibility and future research.
Related papers
- Benchmarking Linguistic Diversity of Large Language Models [14.824871604671467]
This paper emphasizes the importance of examining the preservation of human linguistic richness by language models.
We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives.
arXiv Detail & Related papers (2024-12-13T16:46:03Z) - From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition [6.617999710257379]
We propose a three-stage framework to assess the abilities of LMs.
We evaluate the generative capacities of LMs using methods from linguistic research.
arXiv Detail & Related papers (2024-10-17T06:31:49Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models [16.942897938964638]
Large Language Models (LLMs) have shown exceptional performance in various Natural Language Processing (NLP) tasks.
Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages.
This study focuses on three primary questions: the existence of cross-lingual inconsistencies in LLMs, the specific aspects in which these inconsistencies manifest, and the correlation between cross-lingual consistency and multilingual capabilities of LLMs.
arXiv Detail & Related papers (2024-07-01T15:11:37Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks.
We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.