Quality Prediction of Open Educational Resources A Metadata-based
Approach
- URL: http://arxiv.org/abs/2005.10542v3
- Date: Fri, 29 May 2020 15:26:30 GMT
- Title: Quality Prediction of Open Educational Resources A Metadata-based
Approach
- Authors: Mohammadreza Tavakoli, Mirette Elias, G\'abor Kismih\'ok, S\"oren Auer
- Abstract summary: Metadata play a key role in offering high quality services such as recommendation and search.
We propose an OER metadata scoring model, and build a metadata-based prediction model to anticipate the quality of OERs.
Based on our data and model, we were able to detect high-quality OERs with the F1 score of 94.6%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the recent decade, online learning environments have accumulated millions
of Open Educational Resources (OERs). However, for learners, finding relevant
and high quality OERs is a complicated and time-consuming activity.
Furthermore, metadata play a key role in offering high quality services such as
recommendation and search. Metadata can also be used for automatic OER quality
control as, in the light of the continuously increasing number of OERs, manual
quality control is getting more and more difficult. In this work, we collected
the metadata of 8,887 OERs to perform an exploratory data analysis to observe
the effect of quality control on metadata quality. Subsequently, we propose an
OER metadata scoring model, and build a metadata-based prediction model to
anticipate the quality of OERs. Based on our data and model, we were able to
detect high-quality OERs with the F1 score of 94.6%.
Related papers
- What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices [91.71951459594074]
Long language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios.
Existing methods typically utilize the Self-Instruct framework to generate instruction tuning data for better long context capability improvement.
We propose the Multi-agent Interactive Multi-hop Generation framework, incorporating a Quality Verification Agent, a Single-hop Question Generation Agent, a Multiple Question Sampling Strategy, and a Multi-hop Question Merger Agent.
Our findings show that our synthetic high-quality long-context instruction data significantly enhances model performance, even surpassing models trained on larger amounts of human
arXiv Detail & Related papers (2024-09-03T13:30:00Z) - Towards augmented data quality management: Automation of Data Quality Rule Definition in Data Warehouses [0.0]
This study explores the potential for automating data quality management within data warehouses as data repository commonly used by large organizations.
The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses.
Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses.
arXiv Detail & Related papers (2024-06-16T13:43:04Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - Quality In / Quality Out: Assessing Data quality in an Anomaly Detection
Benchmark [0.13764085113103217]
We show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific Machine Learning technique considered.
Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.
arXiv Detail & Related papers (2023-05-31T12:03:12Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - RLBoost: Boosting Supervised Models using Deep Reinforcement Learning [0.0]
We present RLBoost, an algorithm that uses deep reinforcement learning strategies to evaluate a particular dataset and obtain a model capable of estimating the quality of any new data.
The results of the article show that this model obtains better and more stable results than other state-of-the-art algorithms such as LOO, DataShapley or DVRL.
arXiv Detail & Related papers (2023-05-23T14:38:33Z) - Core-set Selection Using Metrics-based Explanations (CSUME) for
multiclass ECG [2.0520503083305073]
We show how a selection of good quality data improves deep learning model performance.
Our experimental results show a 9.67% and 8.69% precision and recall improvement with a significant training data volume reduction of 50%.
arXiv Detail & Related papers (2022-05-28T19:36:28Z) - ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient
Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation.
We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Metadata Analysis of Open Educational Resources [0.0]
Open Educational Resources (OERs) are openly licensed educational materials that are widely used for learning.
This work uses the metadata of 8,887 OERs to perform an exploratory data analysis on OER metadata.
Based on the results, our analysis demonstrated that OER metadata and OER content qualities are closely related, as we could detect high-quality OERs with an accuracy of 94.6%.
arXiv Detail & Related papers (2021-01-19T17:16:44Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.