Related papers: LLM Context Conditioning and PWP Prompting for Multimodal Validation of Chemical Formulas

LLM Context Conditioning and PWP Prompting for Multimodal Validation of Chemical Formulas

URL: http://arxiv.org/abs/2505.12257v1
Date: Sun, 18 May 2025 06:33:08 GMT
Title: LLM Context Conditioning and PWP Prompting for Multimodal Validation of Chemical Formulas
Authors: Evgeny Markhasin,
Abstract summary: This study investigates structured context conditioning, informed by Persistent Prompting (PWP) principles, as a methodological strategy to modulate this behavior at inference time.<n>The approach is designed to enhance the reliability of readily available, general-purpose Large Language Models (LLMs) for precise validation tasks.<n>Several prompting strategies were evaluated: while basic prompts proved unreliable, an approach adapting PWP structures to rigorously condition the LLM's analytical mindset appeared to improve textual error identification with both models.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Identifying subtle technical errors within complex scientific and technical documents, especially those requiring multimodal interpretation (e.g., formulas in images), presents a significant hurdle for Large Language Models (LLMs) whose inherent error-correction tendencies can mask inaccuracies. This exploratory proof-of-concept (PoC) study investigates structured LLM context conditioning, informed by Persistent Workflow Prompting (PWP) principles, as a methodological strategy to modulate this LLM behavior at inference time. The approach is designed to enhance the reliability of readily available, general-purpose LLMs (specifically Gemini 2.5 Pro and ChatGPT Plus o3) for precise validation tasks, crucially relying only on their standard chat interfaces without API access or model modifications. To explore this methodology, we focused on validating chemical formulas within a single, complex test paper with known textual and image-based errors. Several prompting strategies were evaluated: while basic prompts proved unreliable, an approach adapting PWP structures to rigorously condition the LLM's analytical mindset appeared to improve textual error identification with both models. Notably, this method also guided Gemini 2.5 Pro to repeatedly identify a subtle image-based formula error previously overlooked during manual review, a task where ChatGPT Plus o3 failed in our tests. These preliminary findings highlight specific LLM operational modes that impede detail-oriented validation and suggest that PWP-informed context conditioning offers a promising and highly accessible technique for developing more robust LLM-driven analytical workflows, particularly for tasks requiring meticulous error detection in scientific and technical documents. Extensive validation beyond this limited PoC is necessary to ascertain broader applicability.

Related papers

Revisiting Pre-trained Language Models for Vulnerability Detection [5.747350434960454]
The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks.<n>However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge.<n>This paper introduces RevisitVD, an extensive evaluation of 17 PLMs spanning smaller code-specific PLMs and large-scale PLMs.
arXiv Detail & Related papers (2025-07-22T17:58:49Z)
AI-Driven Scholarly Peer Review via Persistent Workflow Prompting, Meta-Prompting, and Meta-Reasoning [0.0]
This report introduces Persistent Prompting (PWP), a potentially broadly applicable prompt engineering methodology.<n>We present a proof-of-concept PWP prompt for the critical analysis of experimental chemistry manuscripts.<n>We develop this PWP prompt through iterative application of meta-prompting techniques and meta-reasoning aimed at systematically codifying expert review.
arXiv Detail & Related papers (2025-05-06T09:06:18Z)
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model [54.14155564592936]
We propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM)<n>MoRE-LLM steers the discovery of local rule-based surrogates during training and their utilization for the classification task.<n>LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them.
arXiv Detail & Related papers (2025-03-26T11:09:21Z)
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration [49.180693704510006]
Referring Expression (REC) is a cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding.<n>We introduce a new REC dataset with two key features. First, it is designed with controllable difficulty levels, requiring fine-grained reasoning across object categories, attributes, and relationships.<n>Second, it incorporates negative text and images generated through fine-grained editing, explicitly testing a model's ability to reject non-existent targets.
arXiv Detail & Related papers (2025-02-27T13:58:44Z)
Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning [0.0]
Chain-of-thought (CoT) and program-of-thought (PoT) fine-tuning are common methods to transfer LLM knowledge to small language models (SLMs) This paper introduces Gap-Filling Prompting (GFP), a novel two-step prompting strategy designed to enhance the problem-solving process for SLMs.
arXiv Detail & Related papers (2024-11-08T08:52:59Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning [9.601067780210006]
This paper leverages large language models (LLMs) for workflow anomaly detection by exploiting their ability to learn complex data patterns. Two approaches are investigated: 1) supervised fine-tuning (SFT), where pre-trained LLMs are fine-tuned on labeled data for sentence classification to identify anomalies, and 2) in-context learning (ICL) where prompts containing task descriptions and examples guide LLMs in few-shot anomaly detection without fine-tuning.
arXiv Detail & Related papers (2024-07-24T16:33:04Z)
Context Matters: Data-Efficient Augmentation of Large Language Models for Scientific Applications [15.893290942177112]
We explore the challenges inherent to Large Language Models (LLMs) like GPT-4. The capacity of LLMs to present erroneous answers in a coherent and semantically rigorous manner complicates the detection of factual inaccuracies. Our work aims to enhance the understanding and mitigation of such errors, thereby contributing to the improvement of LLM accuracy and reliability.
arXiv Detail & Related papers (2023-12-12T08:43:20Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
Configuration Validation with Large Language Models [22.018488540410548]
Large Language Models (LLMs) show promise in addressing some of the long-lasting limitations of ML-based configuration validation. We develop a generic LLM-based configuration validation framework, named Ciri. Ciri employs effective prompt engineering with few-shot learning based on both valid configuration and misconfiguration data.
arXiv Detail & Related papers (2023-10-15T00:50:27Z)
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust. Existing methods often fall short of explaining model predictions effectively or efficiently. We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.