Related papers: AI Hiring with LLMs: A Context-Aware and Explainable Multi-Agent Framework for Resume Screening

AI Hiring with LLMs: A Context-Aware and Explainable Multi-Agent Framework for Resume Screening

URL: http://arxiv.org/abs/2504.02870v1
Date: Tue, 01 Apr 2025 12:56:39 GMT
Title: AI Hiring with LLMs: A Context-Aware and Explainable Multi-Agent Framework for Resume Screening
Authors: Frank P. -W. Lo, Jianing Qiu, Zeyu Wang, Haibao Yu, Yeming Chen, Gao Zhang, Benny Lo,
Abstract summary: We propose a multi-agent framework for resume screening using Large Language Models (LLMs)<n>The framework consists of four core agents, including a resume extractor, an evaluator, a summarizer, and a score formatter.<n>This dynamic adaptation enables personalized recruitment, bridging the gap between AI automation and talent acquisition.
Score: 12.845918958645676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Resume screening is a critical yet time-intensive process in talent acquisition, requiring recruiters to analyze vast volume of job applications while remaining objective, accurate, and fair. With the advancements in Large Language Models (LLMs), their reasoning capabilities and extensive knowledge bases demonstrate new opportunities to streamline and automate recruitment workflows. In this work, we propose a multi-agent framework for resume screening using LLMs to systematically process and evaluate resumes. The framework consists of four core agents, including a resume extractor, an evaluator, a summarizer, and a score formatter. To enhance the contextual relevance of candidate assessments, we integrate Retrieval-Augmented Generation (RAG) within the resume evaluator, allowing incorporation of external knowledge sources, such as industry-specific expertise, professional certifications, university rankings, and company-specific hiring criteria. This dynamic adaptation enables personalized recruitment, bridging the gap between AI automation and talent acquisition. We assess the effectiveness of our approach by comparing AI-generated scores with ratings provided by HR professionals on a dataset of anonymized online resumes. The findings highlight the potential of multi-agent RAG-LLM systems in automating resume screening, enabling more efficient and scalable hiring workflows.

Related papers

From Text to Talent: A Pipeline for Extracting Insights from Candidate Profiles [44.38380596387969]
This paper proposes a novel pipeline that leverages Large Language Models and graph similarity measures to suggest ideal candidates for specific job openings.<n>Our approach represents candidate profiles as multimodal embeddings, enabling the capture of nuanced relationships between job requirements and candidate attributes.
arXiv Detail & Related papers (2025-03-21T16:18:44Z)
Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents [65.36060818857109]
We present a novel framework for extracting and evaluating dialog from historical interactions. Our extraction process consists of two key stages: (1) a retrieval step to select relevant conversations based on key procedural elements, and (2) a structured workflow generation process using a question-answer-based chain-of-thought (QA-CoT) prompting.
arXiv Detail & Related papers (2025-02-24T16:55:15Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms. We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job? [0.0]
SelfScore is a benchmark designed to assess the performance of automated Large Language Model (LLM) agents on help desk and professional consultation tasks. The benchmark evaluates agents on problem complexity and response helpfulness, ensuring transparency and simplicity in its scoring system. The study raises concerns about the potential displacement of human workers, especially in areas where AI technologies excel.
arXiv Detail & Related papers (2024-10-05T14:37:35Z)
Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting [51.54907796704785]
Existing methods rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-playing capabilities of Large Language Models (LLMs), we propose to introduce a mock interview process between LLM-played interviewers and candidates. We propose MockLLM, a novel applicable framework that divides the person-job matching process into two modules: mock interview generation and two-sided evaluation in handshake protocol.
arXiv Detail & Related papers (2024-05-28T12:23:16Z)
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers. WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform. BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z)
Application of LLM Agents in Recruitment: A Novel Framework for Resume Screening [0.0]
This paper introduces a novel Large Language Models (LLMs) based agent framework for resume screening. Our framework is distinct in its ability to efficiently summarize and grade each resume from a large dataset. The results demonstrate that our automated resume screening framework is 11 times faster than traditional manual methods.
arXiv Detail & Related papers (2024-01-16T12:30:56Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction [19.43668931500507]
We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements. We release the benchmark dataset for research purposes to stimulate further research on the task.
arXiv Detail & Related papers (2022-09-13T13:37:06Z)
Toward a traceable, explainable, and fairJD/Resume recommendation system [10.820022470618234]
Development of an automatic recruitment system is still one of the main challenges. Our aim is to explore how modern language models can be combined with knowledge bases and datasets to enhance the JD/Resume matching process.
arXiv Detail & Related papers (2022-02-02T18:17:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.