Decoding the Alphabet Soup of Degrees in the United States Postsecondary
Education System Through Hybrid Method: Database and Text Mining
- URL: http://arxiv.org/abs/2309.13050v1
- Date: Wed, 6 Sep 2023 16:03:14 GMT
- Title: Decoding the Alphabet Soup of Degrees in the United States Postsecondary
Education System Through Hybrid Method: Database and Text Mining
- Authors: Sahar Voghoei, James Byars, John A Miller, Khaled Rasheed, and Hamid A
Arabnia
- Abstract summary: This paper proposes a model to predict the levels (e.g., Bachelor, Master, etc.) of postsecondary degree awards that have been ambiguously expressed in the student tracking reports of the National Student Clearinghouse (NSC)
The model was trained with four multi-label datasets of different grades of resolution and returned 97.83% accuracy with the most sophisticated dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a model to predict the levels (e.g., Bachelor, Master,
etc.) of postsecondary degree awards that have been ambiguously expressed in
the student tracking reports of the National Student Clearinghouse (NSC). The
model will be the hybrid of two modules. The first module interprets the
relevant abbreviatory elements embedded in NSC reports by referring to a
comprehensive database that we have made of nearly 950 abbreviations for degree
titles used by American postsecondary educators. The second module is a
combination of feature classification and text mining modeled with CNN-BiLSTM,
which is preceded by several steps of heavy pre-processing. The model proposed
in this paper was trained with four multi-label datasets of different grades of
resolution and returned 97.83\% accuracy with the most sophisticated dataset.
Such a thorough classification of degree levels will provide insights into the
modeling patterns of student success and mobility. To date, such a
classification strategy has not been attempted except using manual methods and
simple text parsing logic.
Related papers
- Boosting Short Text Classification with Multi-Source Information Exploration and Dual-Level Contrastive Learning [12.377363857246602]
We propose a novel model named MI-DELIGHT for short text classification.
It first performs multi-source information exploration to alleviate the sparsity issues.
Then, the graph learning approach is adopted to learn the representation of short texts.
arXiv Detail & Related papers (2025-01-16T00:26:15Z) - LLM-based feature generation from text for interpretable machine learning [0.0]
Existing text representations such as embeddings and bag-of-words are not suitable for rule learning due to their high dimensionality and absent or questionable feature-level interpretability.
This article explores whether large language models (LLMs) could address this by extracting a small number of interpretable features from text.
arXiv Detail & Related papers (2024-09-11T09:29:28Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Multilevel Sentence Embeddings for Personality Prediction [0.0]
We propose a two step approach which enables us to map sentences according to their hierarchical memberships and polarity.
We show that our single model approach performs better than multiple class-specific classification models.
arXiv Detail & Related papers (2023-05-09T20:02:18Z) - Enhancing Pashto Text Classification using Language Processing
Techniques for Single And Multi-Label Analysis [0.0]
This study aims to establish an automated classification system for Pashto text.
The study achieved an average testing accuracy rate of 94%.
The use of pre-trained language representation models, such as DistilBERT, showed promising results.
arXiv Detail & Related papers (2023-05-04T23:11:31Z) - A Machine Learning Approach to Classifying Construction Cost Documents
into the International Construction Measurement Standard [0.0]
We introduce the first automated models for classifying natural language descriptions provided in cost documents called "Bills of Quantities"
We learn from a dataset of more than 50 thousand descriptions of items retrieved from 24 large infrastructure construction projects across the United Kingdom.
arXiv Detail & Related papers (2022-10-24T11:35:53Z) - Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM.
Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Hierarchical Text Classification of Urdu News using Deep Neural Network [0.0]
This paper proposes a deep learning model for hierarchical text classification of news in Urdu language.
It consists of 51,325 sentences from 8 online news websites belonging to the following genres: Sports; Technology; and Entertainment.
arXiv Detail & Related papers (2021-07-07T11:06:11Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.