Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching
- URL: http://arxiv.org/abs/2511.02537v1
- Date: Tue, 04 Nov 2025 12:44:54 GMT
- Title: Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching
- Authors: Kenza Khelkhal, Dihia Lanasri,
- Abstract summary: This paper presents Smart-Hiring, an end-to-end Natural Language Processing pipeline de- signed to automatically extract structured information from unstructured resumes.<n>The proposed system combines document parsing, named-entity recognition, and contextual text embedding techniques to capture skills, experience, and qualifications.<n>The system achieves competitive matching accuracy while preserving a high degree of interpretability and transparency in its decision process.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hiring processes often involve the manual screening of hundreds of resumes for each job, a task that is time and effort consuming, error-prone, and subject to human bias. This paper presents Smart-Hiring, an end-to-end Natural Language Processing (NLP) pipeline de- signed to automatically extract structured information from unstructured resumes and to semantically match candidates with job descriptions. The proposed system combines document parsing, named-entity recognition, and contextual text embedding techniques to capture skills, experience, and qualifications. Using advanced NLP technics, Smart-Hiring encodes both resumes and job descriptions in a shared vector space to compute similarity scores between candidates and job postings. The pipeline is modular and explainable, allowing users to inspect extracted entities and matching rationales. Experiments were conducted on a real-world dataset of resumes and job descriptions spanning multiple professional domains, demonstrating the robustness and feasibility of the proposed approach. The system achieves competitive matching accuracy while preserving a high degree of interpretability and transparency in its decision process. This work introduces a scalable and practical NLP frame- work for recruitment analytics and outlines promising directions for bias mitigation, fairness-aware modeling, and large-scale deployment of data-driven hiring solutions.
Related papers
- Semantic Synergy: Unlocking Policy Insights and Learning Pathways Through Advanced Skill Mapping [0.0]
This research introduces a comprehensive system based on state-of-the-art natural language processing, semantic embedding, and efficient search techniques.<n>The system automatically extracts and aggregates normalized competencies from multiple documents.<n>It creates strong relationships between recognized competencies, occupation profiles, and related learning courses.
arXiv Detail & Related papers (2025-03-13T06:41:26Z) - Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [70.65910069412944]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.<n>Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)<n>We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - TAROT: A Hierarchical Framework with Multitask Co-Pretraining on
Semi-Structured Data towards Effective Person-Job Fit [60.31175803899285]
We propose TAROT, a hierarchical multitask co-pretraining framework, to better utilize structural and semantic information for informative text embeddings.
TAROT targets semi-structured text in profiles and jobs, and it is co-pretained with multi-grained pretraining tasks to constrain the acquired semantic information at each level.
arXiv Detail & Related papers (2024-01-15T07:57:58Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Leveraging Knowledge Graphs for Orphan Entity Allocation in Resume
Processing [1.3654846342364308]
This research presents a novel approach for orphan entity allocation in resume processing using knowledge graphs.
The aim is to automate and enhance the efficiency of the job screening process by successfully bucketing orphan entities within resumes.
arXiv Detail & Related papers (2023-10-21T19:10:30Z) - Resume Evaluation through Latent Dirichlet Allocation and Natural
Language Processing for Effective Candidate Selection [2.580765958706854]
We propose a method for resume rating using Latent Dirichlet Allocation (LDA) and entity detection with SpaCy.
With a vision to define our resume score to be more content-driven rather than a structure and keyword match driven, our model has achieved 77% accuracy with respect to only skills in consideration.
arXiv Detail & Related papers (2023-07-28T18:11:17Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - Zero-Shot Information Extraction as a Unified Text-to-Triple Translation [56.01830747416606]
We cast a suite of information extraction tasks into a text-to-triple translation framework.
We formalize the task as a translation between task-specific input text and output triples.
We study the zero-shot performance of this framework on open information extraction.
arXiv Detail & Related papers (2021-09-23T06:54:19Z) - Pretext Tasks selection for multitask self-supervised speech
representation learning [23.39079406674442]
This paper introduces a method to select a group of pretext tasks among a set of candidates.
Experiments conducted on speaker recognition and automatic speech recognition validate our approach.
arXiv Detail & Related papers (2021-07-01T16:36:29Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Learning Effective Representations for Person-Job Fit by Feature Fusion [4.884826427985207]
Person-job fit is to match candidates and job posts on online recruitment platforms using machine learning algorithms.
In this paper, we propose to learn comprehensive and effective representations of the candidates and job posts via feature fusion.
Experiments over 10 months real data show that our solution outperforms existing methods with a large margin.
arXiv Detail & Related papers (2020-06-12T09:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.