Leveraging Large Language Models for Concept Graph Recovery and Question
Answering in NLP Education
- URL: http://arxiv.org/abs/2402.14293v1
- Date: Thu, 22 Feb 2024 05:15:27 GMT
- Title: Leveraging Large Language Models for Concept Graph Recovery and Question
Answering in NLP Education
- Authors: Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang
Jiang, Freddy Lecue, Jinghui Lu, Irene Li
- Abstract summary: Large Language Models (LLMs) have demonstrated promise in text-generation tasks.
This study focuses on concept graph recovery and question-answering (QA)
In TutorQA tasks, LLMs achieve up to 26% F1 score enhancement.
- Score: 14.908333207564574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the domain of Natural Language Processing (NLP), Large Language Models
(LLMs) have demonstrated promise in text-generation tasks. However, their
educational applications, particularly for domain-specific queries, remain
underexplored. This study investigates LLMs' capabilities in educational
scenarios, focusing on concept graph recovery and question-answering (QA). We
assess LLMs' zero-shot performance in creating domain-specific concept graphs
and introduce TutorQA, a new expert-verified NLP-focused benchmark for
scientific graph reasoning and QA. TutorQA consists of five tasks with 500 QA
pairs. To tackle TutorQA queries, we present CGLLM, a pipeline integrating
concept graphs with LLMs for answering diverse questions. Our results indicate
that LLMs' zero-shot concept graph recovery is competitive with supervised
methods, showing an average 3% F1 score improvement. In TutorQA tasks, LLMs
achieve up to 26% F1 score enhancement. Moreover, human evaluation and analysis
show that CGLLM generates answers with more fine-grained concepts.
Related papers
- Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [85.51252685938564]
Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML)
As with other ML models, large language models (LLMs) are prone to make incorrect predictions, hallucinate'' by fabricating claims, or simply generate low-quality output for a given input.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines, and provides an environment for controllable and consistent evaluation of novel techniques.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - LOVA3: Learning to Visual Question Answering, Asking and Assessment [63.41469979867312]
Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge.
Current Multimodal Large Language Models (MLLMs) primarily focus on question answering, often neglecting the full potential of questioning and assessment skills.
In this study, we introduce LOVA3, an innovative framework named Learning tO Visual Question Answering, Asking and Assessment''
arXiv Detail & Related papers (2024-05-23T18:21:59Z) - VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding [65.12464615430036]
This paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of Large Language Models (LLMs)
Ours is a novel approach to extend the utility of LLMs in the context of video tasks.
We harness their contextual learning capabilities to generate executable visual programs for video understanding.
arXiv Detail & Related papers (2024-03-21T18:00:00Z) - Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers.
We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z) - Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models [29.202758753639078]
This study investigates the limitations of Multiple Choice Question Answering (MCQA) as an evaluation method for Large Language Models (LLMs)
We propose a dataset augmenting method for Multiple-Choice Questions (MCQs), MCQA+, that can more accurately reflect the performance of the model.
arXiv Detail & Related papers (2024-02-02T12:07:00Z) - keqing: knowledge-based question answering is a nature chain-of-thought
mentor of LLM [27.76205400533089]
Large language models (LLMs) have exhibited remarkable performance on various natural language processing (NLP) tasks, especially for question answering.
We present a novel framework to assist LLMs, such as ChatGPT, to retrieve question-related structured information on the knowledge graph.
The experimental results on KBQA datasets show that Keqing can achieve competitive performance and illustrate the logic of answering each question.
arXiv Detail & Related papers (2023-12-31T08:39:04Z) - Zero-Shot Question Answering over Financial Documents using Large
Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports.
We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z) - DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for
In-Context Learning [66.85379279041128]
In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking to automatically select exemplars for in-context learning.
DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%.
arXiv Detail & Related papers (2023-10-04T16:44:37Z) - An Empirical Study of Pre-trained Language Models in Simple Knowledge
Graph Question Answering [28.31377197194905]
Large-scale pre-trained language models (PLMs) have recently achieved great success and become a milestone in natural language processing (NLP)
In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models.
We compare the performance of different PLMs in KGQA and present three benchmarks for larger-scale KGs.
arXiv Detail & Related papers (2023-03-18T08:57:09Z) - From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language
Models [111.42052290293965]
Large language models (LLMs) have demonstrated excellent zero-shot generalization to new language tasks.
End-to-end training on vision and language data may bridge the disconnections, but is inflexible and computationally expensive.
We propose emphImg2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections.
arXiv Detail & Related papers (2022-12-21T08:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.