Quality Prediction of Open Educational Resources A Metadata-based
Approach
- URL: http://arxiv.org/abs/2005.10542v3
- Date: Fri, 29 May 2020 15:26:30 GMT
- Title: Quality Prediction of Open Educational Resources A Metadata-based
Approach
- Authors: Mohammadreza Tavakoli, Mirette Elias, G\'abor Kismih\'ok, S\"oren Auer
- Abstract summary: Metadata play a key role in offering high quality services such as recommendation and search.
We propose an OER metadata scoring model, and build a metadata-based prediction model to anticipate the quality of OERs.
Based on our data and model, we were able to detect high-quality OERs with the F1 score of 94.6%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the recent decade, online learning environments have accumulated millions
of Open Educational Resources (OERs). However, for learners, finding relevant
and high quality OERs is a complicated and time-consuming activity.
Furthermore, metadata play a key role in offering high quality services such as
recommendation and search. Metadata can also be used for automatic OER quality
control as, in the light of the continuously increasing number of OERs, manual
quality control is getting more and more difficult. In this work, we collected
the metadata of 8,887 OERs to perform an exploratory data analysis to observe
the effect of quality control on metadata quality. Subsequently, we propose an
OER metadata scoring model, and build a metadata-based prediction model to
anticipate the quality of OERs. Based on our data and model, we were able to
detect high-quality OERs with the F1 score of 94.6%.
Related papers
- Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework [0.0]
Poor data quality limits the advantageous power of Machine Learning (ML)
We propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system.
arXiv Detail & Related papers (2025-02-18T18:01:36Z) - Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.
Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z) - Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - Towards augmented data quality management: Automation of Data Quality Rule Definition in Data Warehouses [0.0]
This study explores the potential for automating data quality management within data warehouses as data repository commonly used by large organizations.
The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses.
Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses.
arXiv Detail & Related papers (2024-06-16T13:43:04Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - RLBoost: Boosting Supervised Models using Deep Reinforcement Learning [0.0]
We present RLBoost, an algorithm that uses deep reinforcement learning strategies to evaluate a particular dataset and obtain a model capable of estimating the quality of any new data.
The results of the article show that this model obtains better and more stable results than other state-of-the-art algorithms such as LOO, DataShapley or DVRL.
arXiv Detail & Related papers (2023-05-23T14:38:33Z) - Core-set Selection Using Metrics-based Explanations (CSUME) for
multiclass ECG [2.0520503083305073]
We show how a selection of good quality data improves deep learning model performance.
Our experimental results show a 9.67% and 8.69% precision and recall improvement with a significant training data volume reduction of 50%.
arXiv Detail & Related papers (2022-05-28T19:36:28Z) - ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient
Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation.
We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z) - Metadata Analysis of Open Educational Resources [0.0]
Open Educational Resources (OERs) are openly licensed educational materials that are widely used for learning.
This work uses the metadata of 8,887 OERs to perform an exploratory data analysis on OER metadata.
Based on the results, our analysis demonstrated that OER metadata and OER content qualities are closely related, as we could detect high-quality OERs with an accuracy of 94.6%.
arXiv Detail & Related papers (2021-01-19T17:16:44Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.