UniSkill: A Dataset for Matching University Curricula to Professional Competencies
- URL: http://arxiv.org/abs/2603.03134v1
- Date: Tue, 03 Mar 2026 16:05:57 GMT
- Title: UniSkill: A Dataset for Matching University Curricula to Professional Competencies
- Authors: Nurlan Musazade, Joszef Mezei, Mike Zhang,
- Abstract summary: We release both manually annotated and synthetic datasets of skills from the European Skills, Competences, Qualifications and Occupations taxonomy.<n>We match graduate-level university courses with skills from the Systems Analysts and Management and Organization Analyst ESCO occupation groups at two granularities.<n>We train language models on this dataset to serve as a baseline for retrieval and recommendation systems for course-to-skill and skill-to-course matching.
- Score: 3.9445288162247483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Skill extraction and recommendation systems have been studied from recruiter, applicant, and education perspectives. While AI applications in job advertisements have received broad attention, deficiencies in the instructed skills side remain a challenge. In this work, we address the scarcity of publicly available datasets by releasing both manually annotated and synthetic datasets of skills from the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy and university course pairs and publishing corresponding annotation guidelines. Specifically, we match graduate-level university courses with skills from the Systems Analysts and Management and Organization Analyst ESCO occupation groups at two granularities: course title with a skill, and course sentence with a skill. We train language models on this dataset to serve as a baseline for retrieval and recommendation systems for course-to-skill and skill-to-course matching. We evaluate the models on a portion of the annotated data. Our BERT model achieves 87% F1-score, showing that course and skill matching is a feasible task.
Related papers
- Enhancing Job Matching: Occupation, Skill and Qualification Linking with the ESCO and EQF taxonomies [0.0]
This study investigates the potential of language models to improve the classification of labor market information.<n>We examine and compare two prominent methodologies from the literature: Sentence Linking and Entity Linking.<n>In support of ongoing research, we release an open-source tool, incorporating these two methodologies.
arXiv Detail & Related papers (2025-12-02T19:49:43Z) - Tec-Habilidad: Skill Classification for Bridging Education and Employment [0.7373617024876725]
This paper develops a Spanish language dataset for skill extraction and classification.<n>It provides annotation methodology to distinguish between knowledge, skill, and abilities.<n>It also provides deep learning baselines to advance robust solutions for skill classification.
arXiv Detail & Related papers (2025-03-05T22:05:42Z) - Dynamic Skill Adaptation for Large Language Models [78.31322532135272]
We present Dynamic Skill Adaptation (DSA), an adaptive and dynamic framework to adapt novel and complex skills to Large Language Models (LLMs)<n>For every skill, we utilize LLMs to generate both textbook-like data which contains detailed descriptions of skills for pre-training and exercise-like data which targets at explicitly utilizing the skills to solve problems for instruction-tuning.<n>Experiments on large language models such as LLAMA and Mistral demonstrate the effectiveness of our proposed methods in adapting math reasoning skills and social study skills.
arXiv Detail & Related papers (2024-12-26T22:04:23Z) - Joint Extraction and Classification of Danish Competences for Job Matching [13.364545674944825]
This work presents the first model that jointly extracts and classifies competence from Danish job postings.
As a single BERT-like architecture for joint extraction and classification, our model is lightweight and efficient at inference.
arXiv Detail & Related papers (2024-10-29T15:00:40Z) - Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking [59.87055275344965]
Job-SDF is a dataset designed to train and benchmark job-skill demand forecasting models.<n>Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023.<n>Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels.
arXiv Detail & Related papers (2024-06-17T07:22:51Z) - Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning.
This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z) - Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy.
We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills.
We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z) - NNOSE: Nearest Neighbor Occupational Skill Extraction [55.22292957778972]
We tackle the complexity in occupational skill datasets.
We employ an external datastore for retrieving similar skills in a dataset-unifying manner.
We observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30% span-F1 in cross-dataset settings.
arXiv Detail & Related papers (2024-01-30T15:18:29Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Large Language Models as Batteries-Included Zero-Shot ESCO Skills
Matchers [0.0]
We propose an end-to-end zero-shot system for skills extraction from job descriptions based on large language models (LLMs)
We generate synthetic training data for the entirety of ESCO skills and train a classifier to extract skill mentions from job posts.
We also employ a similarity retriever to generate skill candidates which are then re-ranked using a second LLM.
arXiv Detail & Related papers (2023-07-07T12:04:12Z) - Design of Negative Sampling Strategies for Distantly Supervised Skill
Extraction [19.43668931500507]
We propose an end-to-end system for skill extraction, based on distant supervision through literal matching.
We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements.
We release the benchmark dataset for research purposes to stimulate further research on the task.
arXiv Detail & Related papers (2022-09-13T13:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.