ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining
- URL: http://arxiv.org/abs/2502.12361v2
- Date: Wed, 19 Feb 2025 19:18:31 GMT
- Title: ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining
- Authors: Xiao Yu, Ruize Xu, Chengyuan Xue, Jinzhong Zhang, Zhou Yu,
- Abstract summary: We introduce ConFit v2, an improvement over ConFit to tackle the sparsity problem.
We propose two techniques to enhance the encoder's contrastive training process.
We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods.
- Score: 15.63898598630676
- License:
- Abstract: A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - List-aware Reranking-Truncation Joint Model for Search and
Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently.
GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture.
Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z) - ConFit: Improving Resume-Job Matching using Data Augmentation and
Contrastive Learning [20.599962663046007]
We tackle the sparsity problem using data augmentations and a simple contrastive learning approach.
ConFit first creates an augmented resume-job dataset by paraphrasing specific sections in a resume or a job post.
We evaluate ConFit on two real-world datasets and find it outperforms prior methods by up to 31% and absolute in nDCG@10 for ranking jobs and ranking resumes, respectively.
arXiv Detail & Related papers (2024-01-29T17:55:18Z) - Distilling Large Language Models using Skill-Occupation Graph Context
for HR-Related Tasks [8.235367170516769]
We introduce the Resume-Job Description Benchmark (RJDB) to cater to a wide array of HR tasks.
Our benchmark includes over 50 thousand triples of job descriptions, matched resumes and unmatched resumes.
Our experiments reveal that the student models achieve near/better performance than the teacher model (GPT-4), affirming the effectiveness of the benchmark.
arXiv Detail & Related papers (2023-11-10T20:25:42Z) - JobHam-place with smart recommend job options and candidate filtering
options [0.0]
Job recommendation and CV ranking starts from the automatic keyword extraction and end with the Job/CV ranking algorithm.
Job2Skill consists of two components, text encoder and Gru-based layers, while CV2Skill is mainly based on Bert.
Job/CV ranking algorithms have been provided to compute the occurrence ratio of skill words based on TFIDF score and match ratio of the total skill numbers.
arXiv Detail & Related papers (2023-03-31T09:54:47Z) - Selective In-Context Data Augmentation for Intent Detection using
Pointwise V-Information [100.03188187735624]
We introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model.
Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents.
Our method is thus able to leverage the expressive power of large language models to produce diverse training data.
arXiv Detail & Related papers (2023-02-10T07:37:49Z) - Construction of English Resume Corpus and Test with Pre-trained Language
Models [0.0]
This study aims to transform the information extraction task of resumes into a simple sentence classification task.
The classification rules are improved to create a larger and more fine-grained classification dataset of resumes.
This corpus is also used to test some current mainstream Pre-training language models (PLMs) performance.
arXiv Detail & Related papers (2022-08-05T15:07:23Z) - KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in
Few-Shot NLP [68.43279384561352]
Existing data augmentation algorithms leverage task-independent rules or fine-tune general-purpose pre-trained language models.
These methods have trivial task-specific knowledge and are limited to yielding low-quality synthetic data for weak baselines in simple tasks.
We propose the Knowledge Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a mixture of diverse NLP tasks.
arXiv Detail & Related papers (2022-06-21T11:34:02Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Job2Vec: Job Title Benchmarking with Collective Multi-View
Representation Learning [51.34011135329063]
Job Title Benchmarking (JTB) aims at matching job titles with similar expertise levels across various companies.
Traditional JTB approaches mainly rely on manual market surveys, which is expensive and labor-intensive.
We reformulate the JTB as the task of link prediction over the Job-Graph that matched job titles should have links.
arXiv Detail & Related papers (2020-09-16T02:33:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.