A Reliable Knowledge Processing Framework for Combustion Science using
Foundation Models
- URL: http://arxiv.org/abs/2401.00544v2
- Date: Tue, 2 Jan 2024 03:03:18 GMT
- Title: A Reliable Knowledge Processing Framework for Combustion Science using
Foundation Models
- Authors: Vansh Sharma and Venkat Raman
- Abstract summary: The study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature.
The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy.
The framework consistently delivers accurate domain-specific responses with minimal human oversight.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This research explores the integration of large language models (LLMs) into
scientific data assimilation, focusing on combustion science as a case study.
Leveraging foundational models integrated with Retrieval-Augmented Generation
(RAG) framework, the study introduces an approach to process diverse combustion
research data, spanning experimental studies, simulations, and literature. The
multifaceted nature of combustion research emphasizes the critical role of
knowledge processing in navigating and extracting valuable information from a
vast and diverse pool of sources. The developed approach minimizes
computational and economic expenses while optimizing data privacy and accuracy.
It incorporates prompt engineering and offline open-source LLMs, offering user
autonomy in selecting base models. The study provides a thorough examination of
text segmentation strategies, conducts comparative studies between LLMs, and
explores various optimized prompts to demonstrate the effectiveness of the
framework. By incorporating an external database, the framework outperforms a
conventional LLM in generating accurate responses and constructing robust
arguments. Additionally, the study delves into the investigation of optimized
prompt templates for the purpose of efficient extraction of scientific
literature. The research addresses concerns related to hallucinations and false
research articles by introducing a custom workflow developed with a detection
algorithm to filter out inaccuracies. Despite identified areas for improvement,
the framework consistently delivers accurate domain-specific responses with
minimal human oversight. The prompt-agnostic approach introduced holds promise
for future deliberations. The study underscores the significance of integrating
LLMs and knowledge processing techniques in scientific research, providing a
foundation for advancements in data assimilation and utilization.
Related papers
- Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
We comprehensively survey over 250 scientific LLMs, discuss their commonalities and differences, as well as summarize pre-training datasets and evaluation tasks for each field and modality.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning [0.9110413356918055]
This research pioneers the use of fine-tuned Large Language Models (LLMs) to automate Systematic Literature Reviews ( SLRs)
Our study employed the latest fine-tuning methodologies together with open-sourced LLMs, and demonstrated a practical and efficient approach to automating the final execution stages of an SLR process.
The results maintained high fidelity in factual accuracy in LLM responses, and were validated through the replication of an existing PRISMA-conforming SLR.
arXiv Detail & Related papers (2024-04-08T00:08:29Z) - The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark [6.116258355914058]
This study conduct an extensive Brain-computer interfaces (BCI) analysis on open electroencephalography datasets.
The need for such benchmark lies in the rapid industrial progress that has given rise to undisclosed proprietary solutions.
arXiv Detail & Related papers (2024-04-03T15:18:50Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Quantitative knowledge retrieval from large language models [4.155711233354597]
Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences.
This paper explores the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid data analysis tasks.
arXiv Detail & Related papers (2024-02-12T16:32:37Z) - Expanding Horizons in HCI Research Through LLM-Driven Qualitative
Analysis [3.5253513747455303]
We introduce a new approach to qualitative analysis in HCI using Large Language Models (LLMs)
Our findings indicate that LLMs not only match the efficacy of traditional analysis methods but also offer unique insights.
arXiv Detail & Related papers (2024-01-07T12:39:31Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - A Study on the Implementation of Generative AI Services Using an
Enterprise Data-Based LLM Application Architecture [0.0]
This study presents a method for implementing generative AI services by utilizing the Large Language Models (LLM) application architecture.
The research delves into strategies for mitigating the issue of inadequate data, offering tailored solutions.
A significant contribution of this work is the development of a Retrieval-Augmented Generation (RAG) model.
arXiv Detail & Related papers (2023-09-03T07:03:17Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.