ConFit: Improving Resume-Job Matching using Data Augmentation and
Contrastive Learning
- URL: http://arxiv.org/abs/2401.16349v1
- Date: Mon, 29 Jan 2024 17:55:18 GMT
- Title: ConFit: Improving Resume-Job Matching using Data Augmentation and
Contrastive Learning
- Authors: Xiao Yu, Jinzhong Zhang, Zhou Yu
- Abstract summary: We tackle the sparsity problem using data augmentations and a simple contrastive learning approach.
ConFit first creates an augmented resume-job dataset by paraphrasing specific sections in a resume or a job post.
We evaluate ConFit on two real-world datasets and find it outperforms prior methods by up to 31% and absolute in nDCG@10 for ranking jobs and ranking resumes, respectively.
- Score: 20.599962663046007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A reliable resume-job matching system helps a company find suitable
candidates from a pool of resumes, and helps a job seeker find relevant jobs
from a list of job posts. However, since job seekers apply only to a few jobs,
interaction records in resume-job datasets are sparse. Different from many
prior work that use complex modeling techniques, we tackle this sparsity
problem using data augmentations and a simple contrastive learning approach.
ConFit first creates an augmented resume-job dataset by paraphrasing specific
sections in a resume or a job post. Then, ConFit uses contrastive learning to
further increase training samples from $B$ pairs per batch to $O(B^2)$ per
batch. We evaluate ConFit on two real-world datasets and find it outperforms
prior methods (including BM25 and OpenAI text-ada-002) by up to 19% and 31%
absolute in nDCG@10 for ranking jobs and ranking resumes, respectively.
Related papers
- List-aware Reranking-Truncation Joint Model for Search and
Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently.
GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture.
Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z) - NNOSE: Nearest Neighbor Occupational Skill Extraction [55.22292957778972]
We tackle the complexity in occupational skill datasets.
We employ an external datastore for retrieving similar skills in a dataset-unifying manner.
We observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30% span-F1 in cross-dataset settings.
arXiv Detail & Related papers (2024-01-30T15:18:29Z) - Divide and Conquer: Hybrid Pre-training for Person Search [40.13016375392472]
We propose a hybrid pre-training framework specifically designed for person search using sub-task data only.
Our model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone.
Our code and pre-trained models are released for plug-and-play usage to the person search community.
arXiv Detail & Related papers (2023-12-13T08:33:50Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Distilling Large Language Models using Skill-Occupation Graph Context
for HR-Related Tasks [8.235367170516769]
We introduce the Resume-Job Description Benchmark (RJDB) to cater to a wide array of HR tasks.
Our benchmark includes over 50 thousand triples of job descriptions, matched resumes and unmatched resumes.
Our experiments reveal that the student models achieve near/better performance than the teacher model (GPT-4), affirming the effectiveness of the benchmark.
arXiv Detail & Related papers (2023-11-10T20:25:42Z) - JobHam-place with smart recommend job options and candidate filtering
options [0.0]
Job recommendation and CV ranking starts from the automatic keyword extraction and end with the Job/CV ranking algorithm.
Job2Skill consists of two components, text encoder and Gru-based layers, while CV2Skill is mainly based on Bert.
Job/CV ranking algorithms have been provided to compute the occurrence ratio of skill words based on TFIDF score and match ratio of the total skill numbers.
arXiv Detail & Related papers (2023-03-31T09:54:47Z) - Construction of English Resume Corpus and Test with Pre-trained Language
Models [0.0]
This study aims to transform the information extraction task of resumes into a simple sentence classification task.
The classification rules are improved to create a larger and more fine-grained classification dataset of resumes.
This corpus is also used to test some current mainstream Pre-training language models (PLMs) performance.
arXiv Detail & Related papers (2022-08-05T15:07:23Z) - KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in
Few-Shot NLP [68.43279384561352]
Existing data augmentation algorithms leverage task-independent rules or fine-tune general-purpose pre-trained language models.
These methods have trivial task-specific knowledge and are limited to yielding low-quality synthetic data for weak baselines in simple tasks.
We propose the Knowledge Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a mixture of diverse NLP tasks.
arXiv Detail & Related papers (2022-06-21T11:34:02Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Job2Vec: Job Title Benchmarking with Collective Multi-View
Representation Learning [51.34011135329063]
Job Title Benchmarking (JTB) aims at matching job titles with similar expertise levels across various companies.
Traditional JTB approaches mainly rely on manual market surveys, which is expensive and labor-intensive.
We reformulate the JTB as the task of link prediction over the Job-Graph that matched job titles should have links.
arXiv Detail & Related papers (2020-09-16T02:33:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.