VacancySBERT: the approach for representation of titles and skills for
semantic similarity search in the recruitment domain
- URL: http://arxiv.org/abs/2307.16638v1
- Date: Mon, 31 Jul 2023 13:21:15 GMT
- Title: VacancySBERT: the approach for representation of titles and skills for
semantic similarity search in the recruitment domain
- Authors: Maiia Bocharova, Eugene Malakhov, Vitaliy Mezhuyev
- Abstract summary: The paper focuses on deep learning semantic search algorithms applied in the HR domain.
The aim of the article is developing a novel approach to training a Siamese network to link the skills mentioned in the job ad with the title.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper focuses on deep learning semantic search algorithms applied in the
HR domain. The aim of the article is developing a novel approach to training a
Siamese network to link the skills mentioned in the job ad with the title. It
has been shown that the title normalization process can be based either on
classification or similarity comparison approaches. While classification
algorithms strive to classify a sample into predefined set of categories,
similarity search algorithms take a more flexible approach, since they are
designed to find samples that are similar to a given query sample, without
requiring pre-defined classes and labels. In this article semantic similarity
search to find candidates for title normalization has been used. A pre-trained
language model has been adapted while teaching it to match titles and skills
based on co-occurrence information. For the purpose of this research fifty
billion title-descriptions pairs had been collected for training the model and
thirty three thousand title-description-normalized title triplets, where
normalized job title was picked up manually by job ad creator for testing
purposes. As baselines FastText, BERT, SentenceBert and JobBert have been used.
As a metric of the accuracy of the designed algorithm is Recall in top one,
five and ten model's suggestions. It has been shown that the novel training
objective lets it achieve significant improvement in comparison to other
generic and specific text encoders. Two settings with treating titles as
standalone strings, and with included skills as additional features during
inference have been used and the results have been compared in this article.
Improvements by 10% and 21.5% have been achieved using VacancySBERT and
VacancySBERT (with skills) respectively. The benchmark has been developed as
open-source to foster further research in the area.
Related papers
- Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Learning Job Titles Similarity from Noisy Skill Labels [0.11498015270151059]
Measuring semantic similarity between job titles is an essential functionality for automatic job recommendations.
In this paper, we propose an unsupervised representation learning method for training a job title similarity model using noisy skill labels.
arXiv Detail & Related papers (2022-07-01T15:30:10Z) - Predicting Job Titles from Job Descriptions with Multi-label Text
Classification [0.0]
We propose the multi-label classification approach for predicting relevant job titles from job description texts.
We implement the Bi-GRU-LSTM-CNN with different pre-trained language models to apply for the job titles prediction problem.
arXiv Detail & Related papers (2021-12-21T09:31:03Z) - Scalable Approach for Normalizing E-commerce Text Attributes (SANTA) [0.25782420501870296]
We present SANTA, a framework to automatically normalize E-commerce attribute values.
We first perform an extensive study of nine syntactic matching algorithms.
We argue that string similarity alone is not sufficient for attribute normalization.
arXiv Detail & Related papers (2021-06-12T08:45:56Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z) - Few-shot Intent Classification and Slot Filling with Retrieved Examples [30.45269507626138]
We propose a span-level retrieval method that learns similar contextualized representations for spans with the same label via a novel batch-softmax objective.
Our method outperforms previous systems in various few-shot settings on the CLINC and SNIPS benchmarks.
arXiv Detail & Related papers (2021-04-12T18:50:34Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - TF-CR: Weighting Embeddings for Text Classification [6.531659195805749]
We introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
Experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes.
arXiv Detail & Related papers (2020-12-11T19:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.