Can Population-based Engagement Improve Personalisation? A Novel Dataset
and Experiments
- URL: http://arxiv.org/abs/2207.01504v1
- Date: Wed, 22 Jun 2022 15:53:24 GMT
- Title: Can Population-based Engagement Improve Personalisation? A Novel Dataset
and Experiments
- Authors: Sahan Bulathwela, Meghana Verma, Maria Perez-Ortiz, Emine Yilmaz and
John Shawe-Taylor
- Abstract summary: VLE is a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures.
Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models.
Experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders.
- Score: 21.12546768556595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work explores how population-based engagement prediction can address
cold-start at scale in large learning resource collections. The paper
introduces i) VLE, a novel dataset that consists of content and video based
features extracted from publicly available scientific video lectures coupled
with implicit and explicit signals related to learner engagement, ii) two
standard tasks related to predicting and ranking context-agnostic engagement in
video lectures with preliminary baselines and iii) a set of experiments that
validate the usefulness of the proposed dataset. Our experimental results
indicate that the newly proposed VLE dataset leads to building context-agnostic
engagement prediction models that are significantly performant than ones based
on previous datasets, mainly attributing to the increase of training examples.
VLE dataset's suitability in building models towards Computer Science/
Artificial Intelligence education focused on e-learning/ MOOC use-cases is also
evidenced. Further experiments in combining the built model with a
personalising algorithm show promising improvements in addressing the
cold-start problem encountered in educational recommenders. This is the largest
and most diverse publicly available dataset to our knowledge that deals with
learner engagement prediction tasks. The dataset, helper tools, descriptive
statistics and example code snippets are available publicly.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Forecasting Early with Meta Learning [4.750521042508541]
We devise a Meta learning method that exploits samples from additional datasets and learns to augment time series through adversarial learning as an auxiliary task for the target dataset.
Our model (FEML) is equipped with a shared Convolutional backbone that learns features for varying length inputs from different datasets and has dataset specific heads to forecast for different output lengths.
arXiv Detail & Related papers (2023-07-19T07:30:01Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Ensemble Machine Learning Model Trained on a New Synthesized Dataset
Generalizes Well for Stress Prediction Using Wearable Devices [3.006016887654771]
We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols.
We propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data.
arXiv Detail & Related papers (2022-09-30T00:20:57Z) - Quality Not Quantity: On the Interaction between Dataset Design and
Robustness of CLIP [43.7219097444333]
We introduce a testbed of six publicly available data sources to investigate how pre-training distributions induce robustness in CLIP.
We find that the performance of the pre-training data varies substantially across distribution shifts.
We find that combining multiple sources does not necessarily yield better models, but rather dilutes the robustness of the best individual data source.
arXiv Detail & Related papers (2022-08-10T18:24:23Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability
Prediction with Multi-task Learning on Self-Supervised Annotations [0.0]
This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available.
Nerve language models are fine-tuned using this procedure in the context of the AcCompl-it shared task at EVALITA 2020.
arXiv Detail & Related papers (2020-11-10T15:50:37Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Predicting Engagement in Video Lectures [24.415345855402624]
We introduce a novel, large dataset of video lectures for predicting context-agnostic engagement.
We propose both cross-modal and modality-specific feature sets to achieve this task.
We demonstrate the use of our approach in the case of data scarcity.
arXiv Detail & Related papers (2020-05-31T19:28:16Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.