Related papers: Quality Prediction of Open Educational Resources A Metadata-based Approach

Quality Prediction of Open Educational Resources A Metadata-based Approach

URL: http://arxiv.org/abs/2005.10542v3
Date: Fri, 29 May 2020 15:26:30 GMT
Title: Quality Prediction of Open Educational Resources A Metadata-based Approach
Authors: Mohammadreza Tavakoli, Mirette Elias, G\'abor Kismih\'ok, S\"oren Auer
Abstract summary: Metadata play a key role in offering high quality services such as recommendation and search. We propose an OER metadata scoring model, and build a metadata-based prediction model to anticipate the quality of OERs. Based on our data and model, we were able to detect high-quality OERs with the F1 score of 94.6%.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the recent decade, online learning environments have accumulated millions of Open Educational Resources (OERs). However, for learners, finding relevant and high quality OERs is a complicated and time-consuming activity. Furthermore, metadata play a key role in offering high quality services such as recommendation and search. Metadata can also be used for automatic OER quality control as, in the light of the continuously increasing number of OERs, manual quality control is getting more and more difficult. In this work, we collected the metadata of 8,887 OERs to perform an exploratory data analysis to observe the effect of quality control on metadata quality. Subsequently, we propose an OER metadata scoring model, and build a metadata-based prediction model to anticipate the quality of OERs. Based on our data and model, we were able to detect high-quality OERs with the F1 score of 94.6%.

Related papers

Enhancing Machine Learning Performance through Intelligent Data Quality Assessment: An Unsupervised Data-centric Framework [0.0]
Poor data quality limits the advantageous power of Machine Learning (ML) We propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system.
arXiv Detail & Related papers (2025-02-18T18:01:36Z)
Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities. Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
Towards augmented data quality management: Automation of Data Quality Rule Definition in Data Warehouses [0.0]
This study explores the potential for automating data quality management within data warehouses as data repository commonly used by large organizations. The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses. Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses.
arXiv Detail & Related papers (2024-06-16T13:43:04Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark [0.13764085113103217]
We show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific Machine Learning technique considered. Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.
arXiv Detail & Related papers (2023-05-31T12:03:12Z)
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances. We design fine-grained step-by-step instructions to obtain the initial data instances. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)
RLBoost: Boosting Supervised Models using Deep Reinforcement Learning [0.0]
We present RLBoost, an algorithm that uses deep reinforcement learning strategies to evaluate a particular dataset and obtain a model capable of estimating the quality of any new data. The results of the article show that this model obtains better and more stable results than other state-of-the-art algorithms such as LOO, DataShapley or DVRL.
arXiv Detail & Related papers (2023-05-23T14:38:33Z)
Core-set Selection Using Metrics-based Explanations (CSUME) for multiclass ECG [2.0520503083305073]
We show how a selection of good quality data improves deep learning model performance. Our experimental results show a 9.67% and 8.69% precision and recall improvement with a significant training data volume reduction of 50%.
arXiv Detail & Related papers (2022-05-28T19:36:28Z)
ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation. We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z)
Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching. AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z)
Metadata Analysis of Open Educational Resources [0.0]
Open Educational Resources (OERs) are openly licensed educational materials that are widely used for learning. This work uses the metadata of 8,887 OERs to perform an exploratory data analysis on OER metadata. Based on the results, our analysis demonstrated that OER metadata and OER content qualities are closely related, as we could detect high-quality OERs with an accuracy of 94.6%.
arXiv Detail & Related papers (2021-01-19T17:16:44Z)
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.