Related papers: Logical analysis and contradiction detection in high-level requirements during the review process using sat-solver

Related papers

Tool for Supporting Debugging and Understanding of Normative Requirements Using LLMs [3.7885668021375465]
Normative requirements specify social, legal, ethical, empathetic, and cultural (SLEEC) norms that must be observed by a system.<n>These requirements are typically defined by stakeholders in the non-technical system with diverse expertise.<n>SLEEC-LLM improves the efficiency and explainability of normative requirements elicitation and consistency analysis.
arXiv Detail & Related papers (2025-07-07T21:57:28Z)
Consistency in Language Models: Current Landscape, Challenges, and Future Directions [8.342499446600268]
State-of-the-art language models struggle to maintain reliable consistency across different scenarios. This paper examines the landscape of consistency research in AI language systems.
arXiv Detail & Related papers (2025-05-01T03:25:25Z)
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL) It employs self-generated natural language critiques as feedback to guide the step-level search process. This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z)
Reliable and Efficient Amortized Model-based Evaluation [57.6469531082784]
The average score across a wide range of benchmarks provides a signal that helps guide the use of language models in practice. A popular attempt to lower the cost is to compute the average score on a subset of the benchmark. This approach often renders an unreliable measure of LM performance because the average score is often confounded with the difficulty of the questions in the benchmark subset. We train a model that predicts question difficulty from its content, enabling a reliable measurement at a fraction of the cost.
arXiv Detail & Related papers (2025-03-17T16:15:02Z)
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers [4.897782942277061]
We introduce Semantic Self-Verification (SSV), a novel approach to accurately formulate the reasoning problem from natural language to the formal language of the solver. SSV uses a consistency-based approach to produce strong abstract formalizations of problems using concrete instantiations that are generated by the model and verified by the solver. We propose such *near-certain reasoning* as a new approach to reduce the need for manual verification in many cases, taking us closer to more dependable and autonomous AI reasoning systems.
arXiv Detail & Related papers (2025-01-28T14:04:49Z)
Few-shot Policy (de)composition in Conversational Question Answering [54.259440408606515]
We propose a neuro-symbolic framework to detect policy compliance using large language models (LLMs) in a few-shot setting. We show that our approach soundly reasons about policy compliance conversations by extracting sub-questions to be answered, assigning truth values from contextual information, and explicitly producing a set of logic statements from the given policies. We apply this approach to the popular PCD and conversational machine reading benchmark, ShARC, and show competitive performance with no task-specific finetuning.
arXiv Detail & Related papers (2025-01-20T08:40:15Z)
BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language? [88.29075896295357]
We first investigate whether current retrieval systems can comprehend the Boolean logic implied in language. Through extensive experimental results, we draw the conclusion that current dense retrieval systems do not fully understand Boolean logic in language. We propose a contrastive continual training method that serves as a strong baseline for the research community.
arXiv Detail & Related papers (2024-11-19T05:19:53Z)
Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation. We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation. We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z)
Early-Stage Requirements Transformation Approaches: A Systematic Review [0.0]
This systematic review examines transformation approaches in the early stages of software development. The review highlights the widespread use of natural language processing techniques, with tools like the Stanford and WordNet being essential. A challenge identified is the lack of robust evaluation methods, with most approaches using simple case studies and running examples for evaluation.
arXiv Detail & Related papers (2024-07-25T18:13:29Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
Vision-Language Understanding (VLU) benchmarks contain samples where answers rely on assumptions unsupported by the provided context. We collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions. We develop a general-purpose Context-AwaRe Abstention detector to identify samples lacking sufficient context and enhance model accuracy.
arXiv Detail & Related papers (2024-05-18T02:21:32Z)
Normative Requirements Operationalization with Large Language Models [3.456725053685842]
Normative non-functional requirements specify constraints that a system must observe in order to avoid violations of social, legal, ethical, empathetic, and cultural norms. Recent research has tackled this challenge using a domain-specific language to specify normative requirements. We propose a complementary approach that uses Large Language Models to extract semantic relationships between abstract representations of system capabilities.
arXiv Detail & Related papers (2024-04-18T17:01:34Z)
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models [4.953092503184905]
This work proposes DCR, an automated framework for evaluating and improving the consistency of Large Language Models (LLMs) generated texts. We introduce an automatic metric converter (AMC) that translates the output from DCE into an interpretable numeric score. Our approach also substantially reduces nearly 90% of output inconsistencies, showing promise for effective hallucination mitigation.
arXiv Detail & Related papers (2024-01-04T08:34:16Z)
Uncertainty in Automated Ontology Matching: Lessons Learned from an Empirical Experimentation [6.491645162078057]
Ontologies play a critical role in link and semantically integrate datasets via interoperability. This paper approaches data integration from an application perspective, looking techniques based on ontology matching.
arXiv Detail & Related papers (2023-10-18T05:42:51Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
When Does Translation Require Context? A Data-driven, Multilingual Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT) Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation. We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z)
Toward the Understanding of Deep Text Matching Models for Information Retrieval [72.72380690535766]
This paper aims at testing whether existing deep text matching methods satisfy some fundamental gradients in information retrieval. Specifically, four attributions are used in our study, i.e., term frequency constraint, term discrimination constraint, length normalization constraints, and TF-length constraint. Experimental results on LETOR 4.0 and MS Marco show that all the investigated deep text matching methods satisfy the above constraints with high probabilities in statistics.
arXiv Detail & Related papers (2021-08-16T13:33:15Z)
A Framework for Evaluation of Machine Reading Comprehension Gold Standards [7.6250852763032375]
This paper proposes a unifying framework to investigate the present linguistic features, required reasoning and background knowledge and factual correctness. The absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.
arXiv Detail & Related papers (2020-03-10T11:30:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.