Leveraging Large Language Models for Concept Graph Recovery and Question
Answering in NLP Education
- URL: http://arxiv.org/abs/2402.14293v1
- Date: Thu, 22 Feb 2024 05:15:27 GMT
- Title: Leveraging Large Language Models for Concept Graph Recovery and Question
Answering in NLP Education
- Authors: Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang
Jiang, Freddy Lecue, Jinghui Lu, Irene Li
- Abstract summary: Large Language Models (LLMs) have demonstrated promise in text-generation tasks.
This study focuses on concept graph recovery and question-answering (QA)
In TutorQA tasks, LLMs achieve up to 26% F1 score enhancement.
- Score: 14.908333207564574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the domain of Natural Language Processing (NLP), Large Language Models
(LLMs) have demonstrated promise in text-generation tasks. However, their
educational applications, particularly for domain-specific queries, remain
underexplored. This study investigates LLMs' capabilities in educational
scenarios, focusing on concept graph recovery and question-answering (QA). We
assess LLMs' zero-shot performance in creating domain-specific concept graphs
and introduce TutorQA, a new expert-verified NLP-focused benchmark for
scientific graph reasoning and QA. TutorQA consists of five tasks with 500 QA
pairs. To tackle TutorQA queries, we present CGLLM, a pipeline integrating
concept graphs with LLMs for answering diverse questions. Our results indicate
that LLMs' zero-shot concept graph recovery is competitive with supervised
methods, showing an average 3% F1 score improvement. In TutorQA tasks, LLMs
achieve up to 26% F1 score enhancement. Moreover, human evaluation and analysis
show that CGLLM generates answers with more fine-grained concepts.
Related papers
- CLR-Bench: Evaluating Large Language Models in College-level Reasoning [17.081788240112417]
Large language models (LLMs) have demonstrated their remarkable performance across various language understanding tasks.
We present CLR-Bench to comprehensively evaluate the LLMs in complex college-level reasoning.
arXiv Detail & Related papers (2024-10-23T04:55:08Z) - Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark [53.61633384281524]
PolyMATH is a benchmark aimed at evaluating the general cognitive reasoning abilities of MLLMs.
The best scores achieved on PolyMATH are 41%, 36%, and 27%, obtained by Claude-3.5 Sonnet, GPT-4o and Gemini-1.5 Pro respectively.
A further fine-grained error analysis reveals that these models struggle to understand spatial relations and perform drawn-out, high-level reasoning.
arXiv Detail & Related papers (2024-10-06T20:35:41Z) - LOVA3: Learning to Visual Question Answering, Asking and Assessment [61.51687164769517]
Question answering, asking, and assessment are three innate human traits crucial for understanding the world and acquiring knowledge.
Current Multimodal Large Language Models (MLLMs) primarily focus on question answering, often neglecting the full potential of questioning and assessment skills.
We introduce LOVA3, an innovative framework named "Learning tO Visual question Answering, Asking and Assessment"
arXiv Detail & Related papers (2024-05-23T18:21:59Z) - VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding [65.12464615430036]
This paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of Large Language Models (LLMs)
Ours is a novel approach to extend the utility of LLMs in the context of video tasks.
We harness their contextual learning capabilities to generate executable visual programs for video understanding.
arXiv Detail & Related papers (2024-03-21T18:00:00Z) - Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers.
We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z) - Zero-Shot Question Answering over Financial Documents using Large
Language Models [0.18749305679160366]
We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports.
We use novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language.
arXiv Detail & Related papers (2023-11-19T16:23:34Z) - DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for
In-Context Learning [66.85379279041128]
In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking to automatically select exemplars for in-context learning.
DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%.
arXiv Detail & Related papers (2023-10-04T16:44:37Z) - An Empirical Study of Pre-trained Language Models in Simple Knowledge
Graph Question Answering [28.31377197194905]
Large-scale pre-trained language models (PLMs) have recently achieved great success and become a milestone in natural language processing (NLP)
In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models.
We compare the performance of different PLMs in KGQA and present three benchmarks for larger-scale KGs.
arXiv Detail & Related papers (2023-03-18T08:57:09Z) - From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language
Models [111.42052290293965]
Large language models (LLMs) have demonstrated excellent zero-shot generalization to new language tasks.
End-to-end training on vision and language data may bridge the disconnections, but is inflexible and computationally expensive.
We propose emphImg2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections.
arXiv Detail & Related papers (2022-12-21T08:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.