Related papers: pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models

pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models

URL: http://arxiv.org/abs/2105.00385v1
Date: Sun, 2 May 2021 03:08:53 GMT
Title: pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models
Authors: Anirudhan Badrinath, Frederic Wang, Zachary Pardos
Abstract summary: We introduce pyBKT, a library of model extensions for knowledge tracing. The library provides data generation, fitting, prediction, and cross-validation routines. pyBKT is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bayesian Knowledge Tracing, a model used for cognitive mastery estimation, has been a hallmark of adaptive learning research and an integral component of deployed intelligent tutoring systems (ITS). In this paper, we provide a brief history of knowledge tracing model research and introduce pyBKT, an accessible and computationally efficient library of model extensions from the literature. The library provides data generation, fitting, prediction, and cross-validation routines, as well as a simple to use data helper interface to ingest typical tutor log dataset formats. We evaluate the runtime with various dataset sizes and compare to past implementations. Additionally, we conduct sanity checks of the model using experiments with simulated data to evaluate the accuracy of its EM parameter learning and use real-world data to validate its predictions, comparing pyBKT's supported model variants with results from the papers in which they were originally introduced. The library is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice and to facilitate progress in the field through easier replication of past approaches.

Related papers

Zero-shot data citation function classification using transformer-based large language models (LLMs) [0.0]
We apply an open-source large language model to generate structured data use case labels for publications known to incorporate specific genomic datasets.<n>Our results demonstrate that the stock model can achieve an F1 score of.674 on a zero-shot data citation classification task with no previously defined categories.
arXiv Detail & Related papers (2025-11-04T19:33:30Z)
Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior [58.58249548116766]
We present an experimental recipe for studying the relationship between training data and language model (LM) behavior.<n>We outline steps for intervening on data batches and then retraining model checkpoints over that data to test hypotheses relating data to behavior.
arXiv Detail & Related papers (2025-10-16T03:22:48Z)
Automated Research Article Classification and Recommendation Using NLP and ML [0.5486463492959637]
This paper presents an automated framework for research article classification and recommendation.<n>We use a large-scale arXiv.org dataset spanning more than three decades.<n>To complement classification, we incorporate a recommendation module based on the cosine similarity of vectorized articles.
arXiv Detail & Related papers (2025-10-07T01:24:35Z)
WHAR Datasets: An Open Source Library for Wearable Human Activity Recognition [5.46517570496579]
We introduce WHAR datasets, an open-source library designed to simplify WHAR data handling.<n>The library currently supports 9 widely-used datasets, integrates with PyTorch and is easily to new datasets.
arXiv Detail & Related papers (2025-08-12T08:43:30Z)
SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner. We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases. Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries [0.0]
We evaluate OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS) The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards.
arXiv Detail & Related papers (2024-03-29T22:59:34Z)
VertiBayes: Learning Bayesian network parameters from vertically partitioned data with missing values [2.9707233220536313]
Federated learning makes it possible to train a machine learning model on decentralized data. We propose a novel method called VertiBayes to train Bayesian networks on vertically partitioned data. We experimentally show our approach produces models comparable to those learnt using traditional algorithms.
arXiv Detail & Related papers (2022-10-31T11:13:35Z)
pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models [46.05383477261115]
Knowledge tracing (KT) is the task of using students' historical learning interaction data to model their knowledge mastery over time. DLKT approaches are still left somewhat unknown and proper measurement and analysis of these approaches remain a challenge. We introduce a comprehensive python based benchmark platform, textscpyKT, to guarantee valid comparisons across DLKT methods.
arXiv Detail & Related papers (2022-06-23T02:42:47Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms [1.7188280334580197]
Probabilistic graphical models are one common approach to modelling the data generating mechanism. We present a novel Snakemake workflow called Benchpress for producing scalable, reproducible, and platform-independent benchmarks. We demonstrate the applicability of this workflow for learning Bayesian networks in five typical data scenarios.
arXiv Detail & Related papers (2021-07-08T14:19:28Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Bayesian active learning for production, a systematic study and a reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques. We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process. We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.