Related papers: Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection

Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection

URL: http://arxiv.org/abs/2406.06257v1
Date: Mon, 10 Jun 2024 13:38:15 GMT
Title: Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection
Authors: Matthias Engelbach, Dennis Klau, Maximilien Kintz, Alexander Ulrich,
Abstract summary: Job descriptions are posted on many online channels, including company websites, job boards or social media platforms. It is helpful to aggregate job postings across platforms and thus detect duplicate descriptions that refer to the same job. We show that combining overlap-based character similarity with text embedding and keyword matching methods lead to convincing results.
Score: 42.49221181099313
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Job descriptions are posted on many online channels, including company websites, job boards or social media platforms. These descriptions are usually published with varying text for the same job, due to the requirements of each platform or to target different audiences. However, for the purpose of automated recruitment and assistance of people working with these texts, it is helpful to aggregate job postings across platforms and thus detect duplicate descriptions that refer to the same job. In this work, we propose an approach for detecting duplicates in job descriptions. We show that combining overlap-based character similarity with text embedding and keyword matching methods lead to convincing results. In particular, we show that although no approach individually achieves satisfying performance, a combination of string comparison, deep textual embeddings, and the use of curated weighted lookup lists for specific skills leads to a significant boost in overall performance. A tool based on our approach is being used in production and feedback from real-life use confirms our evaluation.

Related papers

Smart-Hiring: An Explainable end-to-end Pipeline for CV Information Extraction and Job Matching [0.0]
This paper presents Smart-Hiring, an end-to-end Natural Language Processing pipeline de- signed to automatically extract structured information from unstructured resumes.<n>The proposed system combines document parsing, named-entity recognition, and contextual text embedding techniques to capture skills, experience, and qualifications.<n>The system achieves competitive matching accuracy while preserving a high degree of interpretability and transparency in its decision process.
arXiv Detail & Related papers (2025-11-04T12:44:54Z)
BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z)
Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset [1.825224193230824]
We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation. Our findings indicate that collaborative engagement with annotators can enhance annotation methods.
arXiv Detail & Related papers (2024-08-01T19:11:08Z)
Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval [0.0]
We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. Our functions generate summaries such as there is good correlation between document sentiment and summary sentiment along with good ROUGE score.
arXiv Detail & Related papers (2024-05-20T21:27:18Z)
TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit [60.31175803899285]
We propose TAROT, a hierarchical multitask co-pretraining framework, to better utilize structural and semantic information for informative text embeddings. TAROT targets semi-structured text in profiles and jobs, and it is co-pretained with multi-grained pretraining tasks to constrain the acquired semantic information at each level.
arXiv Detail & Related papers (2024-01-15T07:57:58Z)
VacancySBERT: the approach for representation of titles and skills for semantic similarity search in the recruitment domain [0.0]
The paper focuses on deep learning semantic search algorithms applied in the HR domain. The aim of the article is developing a novel approach to training a Siamese network to link the skills mentioned in the job ad with the title.
arXiv Detail & Related papers (2023-07-31T13:21:15Z)
Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning [54.66399120084227]
Recent state-of-the-art neural text matching models (PLMs) are hard to generalize to different tasks. We adopt a specialization-generalization training strategy and refer to it as Match-Prompt. In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens. In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks.
arXiv Detail & Related papers (2022-04-06T11:01:08Z)
Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition [60.36540008537054]
In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost. We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks. Experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks.
arXiv Detail & Related papers (2021-05-13T12:27:35Z)
GroupLink: An End-to-end Multitask Method for Word Grouping and Relation Extraction in Form Understanding [25.71040852477277]
We build an end-to-end model through multitask training to combine word grouping and relation extraction to enhance performance on each task. We validate our proposed method on a real-world, fully-annotated, noisy-scanned benchmark, FUNSD.
arXiv Detail & Related papers (2021-05-10T20:15:06Z)
Learning to Match Jobs with Resumes from Sparse Interaction Data using Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms. We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching. Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z)
Learning Effective Representations for Person-Job Fit by Feature Fusion [4.884826427985207]
Person-job fit is to match candidates and job posts on online recruitment platforms using machine learning algorithms. In this paper, we propose to learn comprehensive and effective representations of the candidates and job posts via feature fusion. Experiments over 10 months real data show that our solution outperforms existing methods with a large margin.
arXiv Detail & Related papers (2020-06-12T09:02:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.