Standard Occupation Classifier -- A Natural Language Processing Approach
- URL: http://arxiv.org/abs/2511.23057v1
- Date: Fri, 28 Nov 2025 10:30:37 GMT
- Title: Standard Occupation Classifier -- A Natural Language Processing Approach
- Authors: Sidharth Rony, Jack Patman,
- Abstract summary: This project investigates the use of recent developments in natural language processing to construct a classifier capable of assigning an occupation code to a given job advertisement.<n>We develop various classifiers for both UK ONS SOC and US O*NET SOC, using different Language Models.<n>We find that an ensemble model, which combines Google BERT and a Neural Network classifier while considering job title, description, and skills, achieved the highest prediction accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Standard Occupational Classifiers (SOC) are systems used to categorize and classify different types of jobs and occupations based on their similarities in terms of job duties, skills, and qualifications. Integrating these facets with Big Data from job advertisement offers the prospect to investigate labour demand that is specific to various occupations. This project investigates the use of recent developments in natural language processing to construct a classifier capable of assigning an occupation code to a given job advertisement. We develop various classifiers for both UK ONS SOC and US O*NET SOC, using different Language Models. We find that an ensemble model, which combines Google BERT and a Neural Network classifier while considering job title, description, and skills, achieved the highest prediction accuracy. Specifically, the ensemble model exhibited a classification accuracy of up to 61% for the lower (or fourth) tier of SOC, and 72% for the third tier of SOC. This model could provide up to date, accurate information on the evolution of the labour market using job advertisements.
Related papers
- Enhancing Job Matching: Occupation, Skill and Qualification Linking with the ESCO and EQF taxonomies [0.0]
This study investigates the potential of language models to improve the classification of labor market information.<n>We examine and compare two prominent methodologies from the literature: Sentence Linking and Entity Linking.<n>In support of ongoing research, we release an open-source tool, incorporating these two methodologies.
arXiv Detail & Related papers (2025-12-02T19:49:43Z) - Ontology-Aligned Embeddings for Data-Driven Labour Market Analytics [0.0]
We present an embedding-based alignment process that links any free-form German job title to two vocabularies - the German Klassifikation der Berufe and the International Standard Classification of Education.
arXiv Detail & Related papers (2025-09-05T09:08:19Z) - Large Language Models in the Task of Automatic Validation of Text Classifier Predictions [45.88028371034407]
Machine learning models for text classification are trained to predict a class for a given text.<n>To do this, training and validation samples must be prepared, and each text is assigned a class.<n>Human annotators are usually assigned by human annotators with different expertise levels, depending on the specific classification task.<n>This paper proposes several approaches to replace human annotators with Large Language Models.
arXiv Detail & Related papers (2025-05-24T13:19:03Z) - On The Landscape of Spoken Language Models: A Comprehensive Survey [144.11278973534203]
spoken language models (SLMs) act as universal speech processing systems.<n>Work in this area is very diverse, with a range of terminology and evaluation settings.
arXiv Detail & Related papers (2025-04-11T13:40:53Z) - Multilingual hierarchical classification of job advertisements for job vacancy statistics [1.6874375111244329]
The goal of this paper is to develop a multilingual classifier for online job advertisements.<n>We show that incorporation of the hierarchical structure of occupations improves prediction accuracy by 1-2 percentage points.<n>A bilingual (Polish and English) and multilingual (24 languages) model is developed based on data translated using closed and open-source software.
arXiv Detail & Related papers (2024-11-06T09:16:15Z) - Hierarchical Classification of Transversal Skills in Job Ads Based on
Sentence Embeddings [0.0]
This paper aims to identify correlations between job ad requirements and skill sets using a deep learning model.
The approach involves data collection, preprocessing, and labeling using ESCO (European Skills, Competences, and Occupations) taxonomy.
arXiv Detail & Related papers (2024-01-10T11:07:32Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Predicting Job Titles from Job Descriptions with Multi-label Text
Classification [0.0]
We propose the multi-label classification approach for predicting relevant job titles from job description texts.
We implement the Bi-GRU-LSTM-CNN with different pre-trained language models to apply for the job titles prediction problem.
arXiv Detail & Related papers (2021-12-21T09:31:03Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Job2Vec: Job Title Benchmarking with Collective Multi-View
Representation Learning [51.34011135329063]
Job Title Benchmarking (JTB) aims at matching job titles with similar expertise levels across various companies.
Traditional JTB approaches mainly rely on manual market surveys, which is expensive and labor-intensive.
We reformulate the JTB as the task of link prediction over the Job-Graph that matched job titles should have links.
arXiv Detail & Related papers (2020-09-16T02:33:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.