MaScQA: A Question Answering Dataset for Investigating Materials Science
Knowledge of Large Language Models
- URL: http://arxiv.org/abs/2308.09115v1
- Date: Thu, 17 Aug 2023 17:51:05 GMT
- Title: MaScQA: A Question Answering Dataset for Investigating Materials Science
Knowledge of Large Language Models
- Authors: Mohd Zaki, Jayadeva, Mausam, N. M. Anoop Krishnan
- Abstract summary: This work curates a dataset of 650 challenging questions from the materials domain that require the knowledge and skills of a materials student.
It is observed that GPT-4 gives the best performance (62% accuracy) as compared to GPT-3.5.
- Score: 29.70397245624547
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Information extraction and textual comprehension from materials literature
are vital for developing an exhaustive knowledge base that enables accelerated
materials discovery. Language models have demonstrated their capability to
answer domain-specific questions and retrieve information from knowledge bases.
However, there are no benchmark datasets in the materials domain that can
evaluate the understanding of the key concepts by these language models. In
this work, we curate a dataset of 650 challenging questions from the materials
domain that require the knowledge and skills of a materials student who has
cleared their undergraduate degree. We classify these questions based on their
structure and the materials science domain-based subcategories. Further, we
evaluate the performance of GPT-3.5 and GPT-4 models on solving these questions
via zero-shot and chain of thought prompting. It is observed that GPT-4 gives
the best performance (~62% accuracy) as compared to GPT-3.5. Interestingly, in
contrast to the general observation, no significant improvement in accuracy is
observed with the chain of thought prompting. To evaluate the limitations, we
performed an error analysis, which revealed conceptual errors (~64%) as the
major contributor compared to computational errors (~36%) towards the reduced
performance of LLMs. We hope that the dataset and analysis performed in this
work will promote further research in developing better materials science
domain-specific LLMs and strategies for information extraction.
Related papers
- From Text to Insight: Large Language Models for Materials Science Data Extraction [4.08853418443192]
The vast majority of materials science knowledge exists in unstructured natural language.
Structured data is crucial for innovative and systematic materials design.
The advent of large language models (LLMs) represents a significant shift.
arXiv Detail & Related papers (2024-07-23T22:23:47Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Exploring the Potential of the Large Language Models (LLMs) in Identifying Misleading News Headlines [2.0330684186105805]
This study explores the efficacy of Large Language Models (LLMs) in identifying misleading versus non-misleading news headlines.
Our analysis reveals significant variance in model performance, with ChatGPT-4 demonstrating superior accuracy.
arXiv Detail & Related papers (2024-05-06T04:06:45Z) - Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers.
To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z) - Mining experimental data from Materials Science literature with Large Language Models: an evaluation study [1.9849264945671101]
This study is dedicated to assessing the capabilities of large language models (LLMs) in extracting structured information from scientific documents in materials science.
We focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation extraction (RE) between these entities.
The performance of LLMs in executing these tasks is benchmarked against traditional models based on the BERT architecture and rule-based approaches (baseline)
arXiv Detail & Related papers (2024-01-19T23:00:31Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLM [35.208135795371795]
We present a benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT)
A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature.
We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures.
arXiv Detail & Related papers (2023-09-20T14:43:43Z) - Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks [90.11273439036455]
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks.
We propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales from LLMs with augmented knowledge retrieved from an external knowledge base.
We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets.
arXiv Detail & Related papers (2023-05-28T13:00:00Z) - LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated.
We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.