Related papers: Unification of HDP and LDA Models for Optimal Topic Clustering of Subject Specific Question Banks

Unification of HDP and LDA Models for Optimal Topic Clustering of Subject Specific Question Banks

URL: http://arxiv.org/abs/2011.01035v1
Date: Sun, 4 Oct 2020 18:21:20 GMT
Title: Unification of HDP and LDA Models for Optimal Topic Clustering of Subject Specific Question Banks
Authors: Nikhil Fernandes, Alexandra Gkolia, Nicolas Pizzo, James Davenport, Akshar Nair
Abstract summary: An increase in the popularity of online courses would result in an increase in the number of course-related queries for academics. In order to reduce the time spent on answering each individual question, clustering them is an ideal choice. We use the Hierarchical Dirichlet Process to determine an optimal topic number input for our LDA model runs.
Score: 55.41644538483948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has been an increasingly popular trend in Universities for curriculum transformation to make teaching more interactive and suitable for online courses. An increase in the popularity of online courses would result in an increase in the number of course-related queries for academics. This, coupled with the fact that if lectures were delivered in a video on demand format, there would be no fixed time where the majority of students could ask questions. When questions are asked in a lecture there is a negligible chance of having similar questions repeatedly, but asynchronously this is more likely. In order to reduce the time spent on answering each individual question, clustering them is an ideal choice. There are different unsupervised models fit for text clustering, of which the Latent Dirichlet Allocation model is the most commonly used. We use the Hierarchical Dirichlet Process to determine an optimal topic number input for our LDA model runs. Due to the probabilistic nature of these topic models, the outputs of them vary for different runs. The general trend we found is that not all the topics were being used for clustering on the first run of the LDA model, which results in a less effective clustering. To tackle probabilistic output, we recursively use the LDA model on the effective topics being used until we obtain an efficiency ratio of 1. Through our experimental results we also establish a reasoning on how Zeno's paradox is avoided.

Related papers

Iterative Improvement of an Additively Regularized Topic Model [0.0]
We present a method for iterative training of a topic model. Experiments conducted on several collections of natural language texts show that the proposed ITAR model performs better than other popular topic models.
arXiv Detail & Related papers (2024-08-11T18:22:12Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard [47.73060223236792]
BEIR is a benchmark dataset for evaluation of information retrieval models across 18 different domain/task combinations. Our work addresses two shortcomings that prevent the benchmark from achieving its full potential.
arXiv Detail & Related papers (2023-06-13T00:26:18Z)
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question Answering [11.99004747630325]
Unsupervised commonsense reasoning (UCR) is becoming increasingly popular as the construction of commonsense reasoning datasets is expensive. We propose to transform the downstream multiple choice question answering task into a simpler binary classification task by ranking all candidate answers according to their reasonableness.
arXiv Detail & Related papers (2023-05-25T10:59:47Z)
Limits of Model Selection under Transfer Learning [18.53111473571927]
We study the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. In particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those with no distributional information, can be arbitrarily slower than oracle rates.
arXiv Detail & Related papers (2023-04-29T02:27:42Z)
Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility. Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support. With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention. This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z)
Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering. Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z)
The Influence of Domain-Based Preprocessing on Subject-Specific Clustering [55.41644538483948]
The sudden change of moving the majority of teaching online at Universities has caused an increased amount of workload for academics. One way to deal with this problem is to cluster these questions depending on their topic. In this paper, we explore the realms of tagging data sets, focusing on identifying code excerpts and providing empirical results.
arXiv Detail & Related papers (2020-11-16T17:47:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.