Automated Personnel Selection for Software Engineers Using LLM-Based Profile Evaluation
- URL: http://arxiv.org/abs/2410.23365v2
- Date: Sun, 03 Nov 2024 18:35:25 GMT
- Title: Automated Personnel Selection for Software Engineers Using LLM-Based Profile Evaluation
- Authors: Ahmed Akib Jawad Karim, Shahria Hoque, Md. Golam Rabiul Alam, Md. Zia Uddin,
- Abstract summary: This work evaluates software engineer profiles using an automated staff selection method based on advanced natural language processing (NLP) techniques.
A fresh dataset was generated by collecting LinkedIn profiles with important attributes like education, experience, skills, and self-introduction.
With 85% accuracy and an F1 score of 0.85, RoBERTa performed the best; DistilBERT provided comparable results at less computing expense.
- Score: 1.0718353079920009
- License:
- Abstract: Organizational success in todays competitive employment market depends on choosing the right staff. This work evaluates software engineer profiles using an automated staff selection method based on advanced natural language processing (NLP) techniques. A fresh dataset was generated by collecting LinkedIn profiles with important attributes like education, experience, skills, and self-introduction. Expert feedback helped transformer models including RoBERTa, DistilBERT, and a customized BERT variation, LastBERT, to be adjusted. The models were meant to forecast if a candidate's profile fit the selection criteria, therefore allowing automated ranking and assessment. With 85% accuracy and an F1 score of 0.85, RoBERTa performed the best; DistilBERT provided comparable results at less computing expense. Though light, LastBERT proved to be less effective, with 75% accuracy. The reusable models provide a scalable answer for further categorization challenges. This work presents a fresh dataset and technique as well as shows how transformer models could improve recruiting procedures. Expanding the dataset, enhancing model interpretability, and implementing the system in actual environments will be part of future activities.
Related papers
- User Profile with Large Language Models: Construction, Updating, and Benchmarking [0.3350491650545292]
We present two high-quality open-source user profile datasets.
These datasets offer a strong basis for evaluating user profile modeling techniques.
We show a methodology that uses large language models to tackle both profile construction and updating.
arXiv Detail & Related papers (2025-02-15T03:57:52Z) - Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning [59.11519451499754]
Direct Preference Optimization (DPO) has emerged as a de-facto approach for aligning language models with human preferences.
Recent work has shown DPO's effectiveness relies on training data quality.
We discover that reference model probability space naturally detects high-quality training samples.
arXiv Detail & Related papers (2025-01-25T07:21:50Z) - BERT-Based Approach for Automating Course Articulation Matrix Construction with Explainable AI [1.4214002697449326]
Course Outcome (CO) and Program Outcome (PO)/Program-Specific Outcome (PSO) alignment is a crucial task for ensuring curriculum coherence and assessing educational effectiveness.
This work demonstrates the potential of utilizing transfer learning with BERT-based models for the automated generation of Course Articulation Matrix (CAM)
Our system achieves accuracy, precision, recall, and F1-score values of 98.66%, 98.67%, 98.66%, and 98.66%, respectively.
arXiv Detail & Related papers (2024-11-21T16:02:39Z) - Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation [56.13803674092712]
We propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR)
CaR employs a two-step process: first, it ranks instruction pairs using a high-accuracy (84.25%) scoring model aligned with expert preferences; second, it preserves dataset diversity through clustering.
In our experiment, CaR efficiently selected a mere 1.96% of Alpaca's IT data, yet the resulting AlpaCaR model surpassed Alpaca's performance by an average of 32.1% in GPT-4 evaluations.
arXiv Detail & Related papers (2024-02-28T09:27:29Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - AutoFT: Learning an Objective for Robust Fine-Tuning [60.641186718253735]
Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning.
Current approaches to robust fine-tuning use hand-crafted regularization techniques.
We propose AutoFT, a data-driven approach for robust fine-tuning.
arXiv Detail & Related papers (2024-01-18T18:58:49Z) - What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning [43.708781995814675]
We present deita (short for Data-Efficient Instruction Tuning for Alignment), a series of models fine-tuned from LLaMA and Mistral models.
Deita performs better or on par with the state-of-the-art open-source alignment models with only 6K SFT training data samples.
arXiv Detail & Related papers (2023-12-25T10:29:28Z) - Zephyr: Direct Distillation of LM Alignment [59.03530095974505]
We aim to produce a smaller language model that is aligned to user intent.
Previous research has shown that applying supervised fine-tuning (dSFT) on larger models significantly improves task accuracy.
We apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.
arXiv Detail & Related papers (2023-10-25T19:25:16Z) - Skill-it! A Data-Driven Skills Framework for Understanding and Training
Language Models [29.17711426767209]
We study how to best select data that leads to good downstream model performance across tasks.
We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data.
arXiv Detail & Related papers (2023-07-26T18:01:49Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.