Hierarchical Classification of Transversal Skills in Job Ads Based on
Sentence Embeddings
- URL: http://arxiv.org/abs/2401.05073v1
- Date: Wed, 10 Jan 2024 11:07:32 GMT
- Title: Hierarchical Classification of Transversal Skills in Job Ads Based on
Sentence Embeddings
- Authors: Florin Leon, Marius Gavrilescu, Sabina-Adriana Floria, Alina-Adriana
Minea
- Abstract summary: This paper aims to identify correlations between job ad requirements and skill sets using a deep learning model.
The approach involves data collection, preprocessing, and labeling using ESCO (European Skills, Competences, and Occupations) taxonomy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper proposes a classification framework aimed at identifying
correlations between job ad requirements and transversal skill sets, with a
focus on predicting the necessary skills for individual job descriptions using
a deep learning model. The approach involves data collection, preprocessing,
and labeling using ESCO (European Skills, Competences, and Occupations)
taxonomy. Hierarchical classification and multi-label strategies are used for
skill identification, while augmentation techniques address data imbalance,
enhancing model robustness. A comparison between results obtained with
English-specific and multi-language sentence embedding models reveals close
accuracy. The experimental case studies detail neural network configurations,
hyperparameters, and cross-validation results, highlighting the efficacy of the
hierarchical approach and the suitability of the multi-language model for the
diverse European job market. Thus, a new approach is proposed for the
hierarchical classification of transversal skills from job ads.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations.
Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z) - Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification [4.498100922387482]
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient.
Previous results demonstrated that these methods can even improve performance on some classification tasks.
This paper investigates how these techniques influence the classification performance and computation costs compared to full fine-tuning.
arXiv Detail & Related papers (2023-08-14T17:12:43Z) - Improving Self-training for Cross-lingual Named Entity Recognition with
Contrastive and Prototype Learning [80.08139343603956]
In cross-lingual named entity recognition, self-training is commonly used to bridge the linguistic gap.
In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement.
Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling.
arXiv Detail & Related papers (2023-05-23T02:52:16Z) - DeepStruct: Pretraining of Language Models for Structure Prediction [64.84144849119554]
We pretrain language models on a collection of task-agnostic corpora to generate structures from text.
Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks.
We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets.
arXiv Detail & Related papers (2022-05-21T00:58:22Z) - A Top-down Supervised Learning Approach to Hierarchical Multi-label
Classification in Networks [0.21485350418225244]
This paper presents a general prediction model to hierarchical multi-label classification (HMC), where the attributes to be inferred can be specified as a strict poset.
It is based on a top-down classification approach that addresses hierarchical multi-label classification with supervised learning by building a local classifier per class.
The proposed model is showcased with a case study on the prediction of gene functions for Oryza sativa Japonica, a variety of rice.
arXiv Detail & Related papers (2022-03-23T17:29:17Z) - Multi-Task and Multi-Corpora Training Strategies to Enhance
Argumentative Sentence Linking Performance [4.374417345150659]
We improve a state-of-the-art linking model by using multi-task and multi-corpora training strategies.
Our auxiliary tasks help the model to learn the role of each sentence in the argumentative structure.
Experiments on essays written by English-as-a-foreign-language learners show that both strategies significantly improve the model's performance.
arXiv Detail & Related papers (2021-09-27T14:17:40Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Improving Classification through Weak Supervision in Context-specific
Conversational Agent Development for Teacher Education [1.215785021723604]
The effort required to develop an educational scenario specific conversational agent is time consuming.
Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes.
We propose using a multi-task weak supervision method combined with active learning to address these concerns.
arXiv Detail & Related papers (2020-10-23T23:39:40Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.