A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
- URL: http://arxiv.org/abs/2503.12989v1
- Date: Mon, 17 Mar 2025 09:44:50 GMT
- Title: A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
- Authors: Palakorn Achananuparp, Ee-Peng Lim,
- Abstract summary: Large language models (LLMs) hold promise due to their extensive world knowledge and in-context learning capabilities.<n>We propose a multi-stage framework consisting of inference, retrieval, and reranking stages.<n>Our results indicate that the framework outperforms existing LLM-based methods.
- Score: 13.350477885980512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically annotating job data with standardized occupations from taxonomies, known as occupation classification, is crucial for labor market analysis. However, this task is often hindered by data scarcity and the challenges of manual annotations. While large language models (LLMs) hold promise due to their extensive world knowledge and in-context learning capabilities, their effectiveness depends on their knowledge of occupational taxonomies, which remains unclear. In this study, we assess the ability of LLMs to generate precise taxonomic entities from taxonomy, highlighting their limitations. To address these challenges, we propose a multi-stage framework consisting of inference, retrieval, and reranking stages, which integrates taxonomy-guided reasoning examples to enhance performance by aligning outputs with taxonomic knowledge. Evaluations on a large-scale dataset show significant improvements in classification accuracy. Furthermore, we demonstrate the framework's adaptability for multi-label skill classification. Our results indicate that the framework outperforms existing LLM-based methods, offering a practical and scalable solution for occupation classification and related tasks across LLMs.
Related papers
- From Selection to Generation: A Survey of LLM-based Active Learning [153.8110509961261]
Large Language Models (LLMs) have been employed for generating entirely new data instances and providing more cost-effective annotations.<n>This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques.
arXiv Detail & Related papers (2025-02-17T12:58:17Z) - Enhancing Talent Employment Insights Through Feature Extraction with LLM Finetuning [0.0]
We develop a robust pipeline to identify variables such as remote work availability, remuneration structures, educational requirements, and work experience preferences.<n>Our methodology combines semantic chunking, retrieval-augmented generation (RAG), and fine-tuning DistilBERT models to overcome the limitations of traditional parsing tools.<n>We present a comprehensive evaluation of our fine-tuned models and analyze their strengths, limitations, and potential for scaling.
arXiv Detail & Related papers (2025-01-13T19:49:49Z) - Code LLMs: A Taxonomy-based Survey [7.3481279783709805]
Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks.<n>LLMs have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL)
arXiv Detail & Related papers (2024-12-11T11:07:50Z) - Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? [1.0562108865927007]
Large Language Models (LLMs) have demonstrated great potential in complex tasks such as multi-label classification.<n>We present methods that combine the strengths of LLMs with dense retrieval techniques to overcome these challenges.<n>We evaluate the effectiveness of our methods on SSRN, a large repository of preprints spanning multiple disciplines.
arXiv Detail & Related papers (2024-12-06T15:51:22Z) - Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications.
Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs.
By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey [18.570066068280212]
Large Language Models (LLMs) have demonstrated their effectiveness not only in natural language processing but also in broader applications.<n>This survey focuses on the problem of anomaly and OOD detection under the context of LLMs.<n>We propose a new taxonomy to categorize existing approaches into two classes based on the role played by LLMs.
arXiv Detail & Related papers (2024-09-03T15:22:41Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - Rethinking Skill Extraction in the Job Market Domain using Large
Language Models [20.256353240384133]
Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes.
The reliance on manually annotated data limits the generalizability of such approaches.
In this paper, we explore the use of in-context learning to overcome these challenges.
arXiv Detail & Related papers (2024-02-06T09:23:26Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.