A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
- URL: http://arxiv.org/abs/2503.12989v2
- Date: Wed, 16 Jul 2025 21:47:50 GMT
- Title: A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models
- Authors: Palakorn Achananuparp, Ee-Peng Lim, Yao Lu,
- Abstract summary: Large language models (LLMs) hold promise due to their extensive world knowledge and in-context learning capabilities.<n>Our framework integrates taxonomy-guided reasoning examples to enhance performance by aligning outputs with taxonomic knowledge.<n> Evaluations on a large-scale dataset show that our framework not only enhances occupation and skill classification tasks, but also provides a cost-effective alternative to frontier models.
- Score: 15.361247598837002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatically annotating job data with standardized occupations from taxonomies, known as occupation classification, is crucial for labor market analysis. However, this task is often hindered by data scarcity and the challenges of manual annotations. While large language models (LLMs) hold promise due to their extensive world knowledge and in-context learning capabilities, their effectiveness depends on their knowledge of occupational taxonomies, which remains unclear. In this study, we assess the ability of LLMs to generate precise taxonomic entities from taxonomy, highlighting their limitations, especially for smaller models. To address these challenges, we propose a multi-stage framework consisting of inference, retrieval, and reranking stages, which integrates taxonomy-guided reasoning examples to enhance performance by aligning outputs with taxonomic knowledge. Evaluations on a large-scale dataset show that our framework not only enhances occupation and skill classification tasks, but also provides a cost-effective alternative to frontier models like GPT-4o, significantly reducing computational costs while maintaining strong performance. This makes it a practical and scalable solution for occupation classification and related tasks across LLMs.
Related papers
- From Selection to Generation: A Survey of LLM-based Active Learning [153.8110509961261]
Large Language Models (LLMs) have been employed for generating entirely new data instances and providing more cost-effective annotations.<n>This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques.
arXiv Detail & Related papers (2025-02-17T12:58:17Z) - Enhancing Talent Employment Insights Through Feature Extraction with LLM Finetuning [0.0]
We develop a robust pipeline to identify variables such as remote work availability, remuneration structures, educational requirements, and work experience preferences.<n>Our methodology combines semantic chunking, retrieval-augmented generation (RAG), and fine-tuning DistilBERT models to overcome the limitations of traditional parsing tools.<n>We present a comprehensive evaluation of our fine-tuned models and analyze their strengths, limitations, and potential for scaling.
arXiv Detail & Related papers (2025-01-13T19:49:49Z) - Code LLMs: A Taxonomy-based Survey [7.3481279783709805]
Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks.<n>LLMs have recently expanded their impact to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL)
arXiv Detail & Related papers (2024-12-11T11:07:50Z) - Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? [1.0562108865927007]
Large Language Models (LLMs) have demonstrated great potential in complex tasks such as multi-label classification.<n>We present methods that combine the strengths of LLMs with dense retrieval techniques to overcome these challenges.<n>We evaluate the effectiveness of our methods on SSRN, a large repository of preprints spanning multiple disciplines.
arXiv Detail & Related papers (2024-12-06T15:51:22Z) - Unified Parameter-Efficient Unlearning for LLMs [25.195126838721492]
Large Language Models (LLMs) have revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks.<n>This raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information.<n>We introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise adjustments using influence functions.
arXiv Detail & Related papers (2024-11-30T07:21:02Z) - Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications.
Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs.
By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey [18.570066068280212]
Large Language Models (LLMs) have demonstrated their effectiveness not only in natural language processing but also in broader applications.<n>This survey focuses on the problem of anomaly and OOD detection under the context of LLMs.<n>We propose a new taxonomy to categorize existing approaches into two classes based on the role played by LLMs.
arXiv Detail & Related papers (2024-09-03T15:22:41Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts.
Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models.
The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - FLAME: Self-Supervised Low-Resource Taxonomy Expansion using Large
Language Models [19.863010475923414]
Taxonomies find utility in various real-world applications, such as e-commerce search engines and recommendation systems.
Traditional supervised taxonomy expansion approaches encounter difficulties stemming from limited resources.
We propose FLAME, a novel approach for taxonomy expansion in low-resource environments by harnessing the capabilities of large language models.
arXiv Detail & Related papers (2024-02-21T08:50:40Z) - Rethinking Skill Extraction in the Job Market Domain using Large
Language Models [20.256353240384133]
Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes.
The reliance on manually annotated data limits the generalizability of such approaches.
In this paper, we explore the use of in-context learning to overcome these challenges.
arXiv Detail & Related papers (2024-02-06T09:23:26Z) - An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities.
Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool.
We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Prompting or Fine-tuning? A Comparative Study of Large Language Models
for Taxonomy Construction [0.8670827427401335]
We present a general framework for taxonomy construction that takes into account structural constraints.
We compare the prompting and fine-tuning approaches performed on a hypernym taxonomy and a novel computer science taxonomy dataset.
arXiv Detail & Related papers (2023-09-04T16:53:17Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.