Consistency in Language Models: Current Landscape, Challenges, and Future Directions
- URL: http://arxiv.org/abs/2505.00268v1
- Date: Thu, 01 May 2025 03:25:25 GMT
- Title: Consistency in Language Models: Current Landscape, Challenges, and Future Directions
- Authors: Jekaterina Novikova, Carol Anderson, Borhane Blili-Hamelin, Subhabrata Majumdar,
- Abstract summary: State-of-the-art language models struggle to maintain reliable consistency across different scenarios.<n>This paper examines the landscape of consistency research in AI language systems.
- Score: 8.342499446600268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The hallmark of effective language use lies in consistency -- expressing similar meanings in similar contexts and avoiding contradictions. While human communication naturally demonstrates this principle, state-of-the-art language models struggle to maintain reliable consistency across different scenarios. This paper examines the landscape of consistency research in AI language systems, exploring both formal consistency (including logical rule adherence) and informal consistency (such as moral and factual coherence). We analyze current approaches to measure aspects of consistency, identify critical research gaps in standardization of definitions, multilingual assessment, and methods to improve consistency. Our findings point to an urgent need for robust benchmarks to measure and interdisciplinary approaches to ensure consistency in the application of language models on domain-specific tasks while preserving the utility and adaptability.
Related papers
- Enhancing Coreference Resolution with Pretrained Language Models: Bridging the Gap Between Syntax and Semantics [0.9752323911408618]
This study introduces an innovative framework aimed at enhancing coreference resolution by utilizing pretrained language models.<n>Our approach combines syntax parsing with semantic role labeling to accurately capture finer distinctions in referential relationships.
arXiv Detail & Related papers (2025-04-08T09:33:09Z) - Large Language Models Often Say One Thing and Do Another [49.22262396351797]
We develop a novel evaluation benchmark called the Words and Deeds Consistency Test (WDCT)<n>The benchmark establishes a strict correspondence between word-based and deed-based questions across different domains.<n>The evaluation results reveal a widespread inconsistency between words and deeds across different LLMs and domains.
arXiv Detail & Related papers (2025-03-10T07:34:54Z) - Cross-linguistic disagreement as a conflict of semantic alignment norms in multilingual AI~Linguistic Diversity as a Problem for Philosophy, Cognitive Science, and AI~ [0.2443066828522608]
Cross-linguistic consistency (CL-consistency) seeks universal concepts across languages.<n>Folk-consistency, which respects language-specific semantic norms.<n>Findings challenge assumption that universal representations and cross-linguistic transfer capabilities are inherently desirable.
arXiv Detail & Related papers (2025-03-01T03:31:40Z) - Improving Large Language Model (LLM) fidelity through context-aware grounding: A systematic approach to reliability and veracity [0.0]
Large Language Models (LLMs) are increasingly sophisticated and ubiquitous in natural language processing (NLP) applications.
This paper presents a novel framework for contextual grounding in textual models, with a particular emphasis on the Context Representation stage.
Our findings have significant implications for the deployment of LLMs in sensitive domains such as healthcare, legal systems, and social services.
arXiv Detail & Related papers (2024-08-07T18:12:02Z) - Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models [16.942897938964638]
Large Language Models (LLMs) have shown exceptional performance in various Natural Language Processing (NLP) tasks.
Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages.
This study focuses on three primary questions: the existence of cross-lingual inconsistencies in LLMs, the specific aspects in which these inconsistencies manifest, and the correlation between cross-lingual consistency and multilingual capabilities of LLMs.
arXiv Detail & Related papers (2024-07-01T15:11:37Z) - Regularized Conventions: Equilibrium Computation as a Model of Pragmatic
Reasoning [72.21876989058858]
We present a model of pragmatic language understanding, where utterances are produced and understood by searching for regularized equilibria of signaling games.
In this model speakers and listeners search for contextually appropriate utterance--meaning mappings that are both close to game-theoretically optimal conventions and close to a shared, ''default'' semantics.
arXiv Detail & Related papers (2023-11-16T09:42:36Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Natural Language Decompositions of Implicit Content Enable Better Text Representations [52.992875653864076]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.<n>We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.<n>Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Measuring and Improving Consistency in Pretrained Language Models [40.46184998481918]
We study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge?
Using ParaRel, we show that the consistency of all PLMs we experiment with is poor -- though with high variance between relations.
arXiv Detail & Related papers (2021-02-01T17:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.