Related papers: Enhancing Talent Employment Insights Through Feature Extraction with LLM Finetuning

Enhancing Talent Employment Insights Through Feature Extraction with LLM Finetuning

URL: http://arxiv.org/abs/2501.07663v1
Date: Mon, 13 Jan 2025 19:49:49 GMT
Title: Enhancing Talent Employment Insights Through Feature Extraction with LLM Finetuning
Authors: Karishma Thakrar, Nick Young,
Abstract summary: We develop a robust pipeline to identify variables such as remote work availability, remuneration structures, educational requirements, and work experience preferences.<n>Our methodology combines semantic chunking, retrieval-augmented generation (RAG), and fine-tuning DistilBERT models to overcome the limitations of traditional parsing tools.<n>We present a comprehensive evaluation of our fine-tuned models and analyze their strengths, limitations, and potential for scaling.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores the application of large language models (LLMs) to extract nuanced and complex job features from unstructured job postings. Using a dataset of 1.2 million job postings provided by AdeptID, we developed a robust pipeline to identify and classify variables such as remote work availability, remuneration structures, educational requirements, and work experience preferences. Our methodology combines semantic chunking, retrieval-augmented generation (RAG), and fine-tuning DistilBERT models to overcome the limitations of traditional parsing tools. By leveraging these techniques, we achieved significant improvements in identifying variables often mislabeled or overlooked, such as non-salary-based compensation and inferred remote work categories. We present a comprehensive evaluation of our fine-tuned models and analyze their strengths, limitations, and potential for scaling. This work highlights the promise of LLMs in labor market analytics, providing a foundation for more accurate and actionable insights into job data.

Related papers

Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z)
LANTERN: Scalable Distillation of Large Language Models for Job-Person Fit and Explanation [16.960316035628008]
At LinkedIn, job person fit task requires analyzing a candidate's public profile against job requirements to produce both a fit assessment and a detailed explanation.<n>We introduce LANTERN, a novel LLM knowledge distillation framework tailored specifically for job person fit tasks.<n>We show that LANTERN significantly improves task specific metrics for both job person fit and explanation.
arXiv Detail & Related papers (2025-10-07T01:10:02Z)
Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation [0.061446808540639365]
We use Knowledge Graphs to enhance Large Language Models (LLMs) for zero-shot Entity Disambiguation (ED)<n>We leverage the hierarchical representation of the entities' classes in a KG to prune the candidate space as well as the entities' descriptions to enrich the input prompt with additional factual knowledge.<n>Our evaluation on popular ED datasets shows that the proposed method outperforms non-enhanced and description-only enhanced LLMs, and has a higher degree of adaptability than task-specific models.
arXiv Detail & Related papers (2025-05-05T15:40:24Z)
A Multi-Stage Framework with Taxonomy-Guided Reasoning for Occupation Classification Using Large Language Models [13.350477885980512]
Large language models (LLMs) hold promise due to their extensive world knowledge and in-context learning capabilities. We propose a multi-stage framework consisting of inference, retrieval, and reranking stages. Our results indicate that the framework outperforms existing LLM-based methods.
arXiv Detail & Related papers (2025-03-17T09:44:50Z)
Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs [0.464982780843177]
This work investigates the ability of open Large Language Models (LLMs) to predict citation intent through in-context learning and fine-tuning. We evaluate twelve model variations across five prominent open LLM families using zero, one, few, and many-shot prompting to assess performance across scenarios. The results highlight the strengths and limitations of LLMs in recognizing citation intents, providing valuable insights for model selection and prompt engineering.
arXiv Detail & Related papers (2025-02-20T13:45:42Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation [15.254775341371364]
We explore the possibility of leveraging large language models for zero-shot counterfactual generation. We propose a structured pipeline to facilitate this generation, and we hypothesize that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged.
arXiv Detail & Related papers (2024-05-08T03:57:45Z)
Evolving Knowledge Distillation with Large Language Models and Active Learning [46.85430680828938]
Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks. Previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data. We propose EvoKD: Evolving Knowledge Distillation, which leverages the concept of active learning to interactively enhance the process of data generation using large language models.
arXiv Detail & Related papers (2024-03-11T03:55:24Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition [67.96794382040547]
$LLM-DA$ is a novel data augmentation technique based on large language models (LLMs) for the few-shot NER task. Our approach involves employing 14 contextual rewriting strategies, designing entity replacements of the same type, and incorporating noise injection to enhance robustness.
arXiv Detail & Related papers (2024-02-22T14:19:56Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of supervised fine-tuning (SFT)<n>We also review the potential pitfalls of SFT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies.
arXiv Detail & Related papers (2023-08-21T15:35:16Z)
Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)
"FIJO": a French Insurance Soft Skill Detection Dataset [0.0]
This article proposes a new public dataset, FIJO, containing insurance job offers, including many soft skill annotations. We present the results of skill detection algorithms using a named entity recognition approach and show that transformers-based models have good token-wise performances on this dataset.
arXiv Detail & Related papers (2022-04-11T15:54:22Z)
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.