Can Foundational Large Language Models Assist with Conducting   Pharmaceuticals Manufacturing Investigations?
        - URL: http://arxiv.org/abs/2404.15578v1
 - Date: Wed, 24 Apr 2024 00:56:22 GMT
 - Title: Can Foundational Large Language Models Assist with Conducting   Pharmaceuticals Manufacturing Investigations?
 - Authors: Hossein Salami, Brandye Smith-Goettler, Vijay Yadav, 
 - Abstract summary: We focus on a specific use case, pharmaceutical manufacturing investigations.
We propose that leveraging historical records of manufacturing incidents and deviations can be beneficial for addressing and closing new cases.
We show that semantic search on vector embedding of deviation descriptions can be used to identify similar records.
 - Score: 0.0
 - License: http://creativecommons.org/licenses/by-nc-nd/4.0/
 - Abstract:   General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open question. In this work, we focus on a specific use case, pharmaceutical manufacturing investigations, and propose that leveraging historical records of manufacturing incidents and deviations in an organization can be beneficial for addressing and closing new cases, or de-risking new manufacturing campaigns. Using a small but diverse dataset of real manufacturing deviations selected from different product lines, we evaluate and quantify the power of three general purpose LLMs (GPT-3.5, GPT-4, and Claude-2) in performing tasks related to the above goal. In particular, (1) the ability of LLMs in automating the process of extracting specific information such as root cause of a case from unstructured data, as well as (2) the possibility of identifying similar or related deviations by performing semantic search on the database of historical records are examined. While our results point to the high accuracy of GPT-4 and Claude-2 in the information extraction task, we discuss cases of complex interplay between the apparent reasoning and hallucination behavior of LLMs as a risk factor. Furthermore, we show that semantic search on vector embedding of deviation descriptions can be used to identify similar records, such as those with a similar type of defect, with a high level of accuracy. We discuss further improvements to enhance the accuracy of similar record identification. 
 
       
      
        Related papers
        - Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact   Verifiers [59.168391398830515]
We evaluate 12 pre-trained LLMs and one specialized fact-verifier, using a collection of examples from 14 fact-checking benchmarks.<n>We highlight the importance of addressing annotation errors and ambiguity in datasets.<n> frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance.
arXiv  Detail & Related papers  (2025-06-16T10:32:10Z) - CaseReportBench: An LLM Benchmark Dataset for Dense Information   Extraction in Clinical Case Reports [4.477840500181267]
We introduce CaseReportBench, an expert-annotated dataset for dense information extraction of case reports, focusing on IEMs.<n>We assess various models and prompting strategies, introducing novel approaches such as category-specific prompting and subheading-filtered data integration.<n>Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management.
arXiv  Detail & Related papers  (2025-05-22T20:21:32Z) - Fact-checking with Generative AI: A Systematic Cross-Topic Examination   of LLMs Capacity to Detect Veracity of Political Information [0.0]
The purpose of this study is to assess how large language models (LLMs) can be used for fact-checking.
We use AI auditing methodology that systematically evaluates performance of five LLMs.
The results indicate that models are better at identifying false statements, especially on sensitive topics.
arXiv  Detail & Related papers  (2025-03-11T13:06:40Z) - AD-LLM: Benchmarking Large Language Models for Anomaly Detection [50.57641458208208]
This paper introduces AD-LLM, the first benchmark that evaluates how large language models can help with anomaly detection.
We examine three key tasks: zero-shot detection, using LLMs' pre-trained knowledge to perform AD without tasks-specific training; data augmentation, generating synthetic data and category descriptions to improve AD models; and model selection, using LLMs to suggest unsupervised AD models.
arXiv  Detail & Related papers  (2024-12-15T10:22:14Z) - AutoElicit: Using Large Language Models for Expert Prior Elicitation in   Predictive Modelling [53.54623137152208]
We introduce AutoElicit to extract knowledge from large language models and construct priors for predictive models.
We show these priors are informative and can be refined using natural language.
We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.
arXiv  Detail & Related papers  (2024-11-26T10:13:39Z) - Disentangling Memory and Reasoning Ability in Large Language Models [97.26827060106581]
We propose a new inference paradigm that decomposes the complex inference process into two distinct and clear actions.
Our experiment results show that this decomposition improves model performance and enhances the interpretability of the inference process.
arXiv  Detail & Related papers  (2024-11-20T17:55:38Z) - Combining Domain-Specific Models and LLMs for Automated Disease   Phenotyping from Survey Data [0.0]
This pilot study investigated the potential of combining a domain-specific model, BERN2, with large language models (LLMs) to enhance automated phenotyping from research survey data.
We employed BERN2, a named entity recognition and normalization model, to extract information from the ORIGINS survey data.
BERN2 demonstrated high performance in extracting and normalizing disease mentions, and the integration of LLMs, particularly with Few Shot Inference and RAG orchestration, further improved accuracy.
arXiv  Detail & Related papers  (2024-10-28T02:55:03Z) - Advancing Cyber Incident Timeline Analysis Through Rule Based AI and   Large Language Models [0.0]
This paper introduces a novel framework, GenDFIR, which combines Rule-Based Artificial Intelligence (R-BAI) algorithms with Large Language Models (LLMs) to enhance and automate the Timeline Analysis process.
arXiv  Detail & Related papers  (2024-09-04T09:46:33Z) - Using LLMs for Explaining Sets of Counterfactual Examples to Final Users [0.0]
In automated decision-making scenarios, causal inference methods can analyze the underlying data-generation process.
Counterfactual examples explore hypothetical scenarios where a minimal number of factors are altered.
We propose a novel multi-step pipeline that uses counterfactuals to generate natural language explanations of actions that will lead to a change in outcome.
arXiv  Detail & Related papers  (2024-08-27T15:13:06Z) - Investigating Annotator Bias in Large Language Models for Hate Speech   Detection [5.589665886212444]
This paper delves into the biases present in Large Language Models (LLMs) when annotating hate speech data.
Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases.
We introduce our custom hate speech detection dataset, HateBiasNet, to conduct this research.
arXiv  Detail & Related papers  (2024-06-17T00:18:31Z) - Evaluating Interventional Reasoning Capabilities of Large Language   Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system.
We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv  Detail & Related papers  (2024-04-08T14:15:56Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large   Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv  Detail & Related papers  (2023-10-19T06:37:32Z) - Improving Open Information Extraction with Large Language Models: A
  Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv  Detail & Related papers  (2023-09-07T01:35:24Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data   Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv  Detail & Related papers  (2023-08-23T09:45:29Z) - Can Large Language Models Infer Causation from Correlation? [104.96351414570239]
We test the pure causal inference skills of large language models (LLMs)
We formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables.
We show that these models achieve almost close to random performance on the task.
arXiv  Detail & Related papers  (2023-06-09T12:09:15Z) - Causal Reasoning and Large Language Models: Opening a New Frontier for   Causality [29.433401785920065]
Large language models (LLMs) can generate causal arguments with high probability.
LLMs may be used by human domain experts to save effort in setting up a causal analysis.
arXiv  Detail & Related papers  (2023-04-28T19:00:43Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.