A Transfer Learning Pipeline for Educational Resource Discovery with
Application in Leading Paragraph Generation
- URL: http://arxiv.org/abs/2201.02312v1
- Date: Fri, 7 Jan 2022 03:35:40 GMT
- Title: A Transfer Learning Pipeline for Educational Resource Discovery with
Application in Leading Paragraph Generation
- Authors: Irene Li, Thomas George, Alexander Fabbri, Tammy Liao, Benjamin Chen,
Rina Kawamura, Richard Zhou, Vanessa Yan, Swapnil Hingmire, Dragomir Radev
- Abstract summary: We propose a pipeline that automates web resource discovery for novel domains.
The pipeline achieves F1 scores of 0.94 and 0.82 when evaluated on two similar but novel target domains.
This is the first study that considers various web resources for survey generation.
- Score: 71.92338855383238
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Effective human learning depends on a wide selection of educational materials
that align with the learner's current understanding of the topic. While the
Internet has revolutionized human learning or education, a substantial resource
accessibility barrier still exists. Namely, the excess of online information
can make it challenging to navigate and discover high-quality learning
materials. In this paper, we propose the educational resource discovery (ERD)
pipeline that automates web resource discovery for novel domains. The pipeline
consists of three main steps: data collection, feature extraction, and resource
classification. We start with a known source domain and conduct resource
discovery on two unseen target domains via transfer learning. We first collect
frequent queries from a set of seed documents and search on the web to obtain
candidate resources, such as lecture slides and introductory blog posts. Then
we introduce a novel pretrained information retrieval deep neural network
model, query-document masked language modeling (QD-MLM), to extract deep
features of these candidate resources. We apply a tree-based classifier to
decide whether the candidate is a positive learning resource. The pipeline
achieves F1 scores of 0.94 and 0.82 when evaluated on two similar but novel
target domains. Finally, we demonstrate how this pipeline can benefit an
application: leading paragraph generation for surveys. This is the first study
that considers various web resources for survey generation, to the best of our
knowledge. We also release a corpus of 39,728 manually labeled web resources
and 659 queries from NLP, Computer Vision (CV), and Statistics (STATS).
Related papers
- A Survey on Deep Active Learning: Recent Advances and New Frontiers [27.07154361976248]
This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in deep learning-based active learning (DAL)
This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce.
arXiv Detail & Related papers (2024-05-01T05:54:33Z) - A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation [51.31429493814664]
We present a benchmark named multi-source Wizard of Wikipedia for evaluating multi-source dialogue knowledge selection and response generation.
We propose a new challenge, dialogue knowledge plug-and-play, which aims to test an already trained dialogue model on using new support knowledge from previously unseen sources.
arXiv Detail & Related papers (2024-03-06T06:54:02Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Can We Trust AI-Generated Educational Content? Comparative Analysis of
Human and AI-Generated Learning Resources [4.528957284486784]
Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale.
We compare the quality of resources generated by an LLM with those created by students as part of a learnersourcing activity.
Our results show that the quality of AI-generated resources, as perceived by students, is equivalent to the quality of resources generated by their peers.
arXiv Detail & Related papers (2023-06-18T09:49:21Z) - Learning To Rank Resources with GNN [7.337247167823921]
We propose a graph neural network (GNN) based approach to learning-to-rank that is capable of modeling resource-query and resource-resource relationships.
Our method outperforms the state-of-the-art by 6.4% to 42% on various performance metrics.
arXiv Detail & Related papers (2023-04-17T02:01:45Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering
and Reading Comprehension [41.6087902739702]
This study is the largest survey of the field to date.
We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work.
We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources.
arXiv Detail & Related papers (2021-07-27T10:09:13Z) - Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic
Parsing [85.35582118010608]
Task-oriented semantic parsing is a critical component of virtual assistants.
Recent advances in deep learning have enabled several approaches to successfully parse more complex queries.
We propose a novel method that outperforms a supervised neural model at a 10-fold data reduction.
arXiv Detail & Related papers (2020-10-07T17:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.