Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeroen Van Hautte

Career Path Prediction using Resume Representation Learning and Skill-based Matching

Oct 24, 2023
Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester

The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach achieves the strongest result with 43.01% recall@10.

* Accepted to the 3nd Workshop on Recommender Systems for Human Resources (RecSys in HR 2023) as part of RecSys 2023

Via

Access Paper or Ask Questions

Extreme Multi-Label Skill Extraction Training using Large Language Models

Jul 20, 2023
Jens-Joris Decorte, Severine Verlinden, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester

Figure 1 for Extreme Multi-Label Skill Extraction Training using Large Language Models

Figure 2 for Extreme Multi-Label Skill Extraction Training using Large Language Models

Figure 3 for Extreme Multi-Label Skill Extraction Training using Large Language Models

Figure 4 for Extreme Multi-Label Skill Extraction Training using Large Language Models

Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classification (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in \textit{R-Precision@5} compared to previously published results that relied solely on distant supervision through literal matches.

* Accepted to the International workshop on AI for Human Resources and Public Employment Services (AI4HR&PES) as part of ECML-PKDD 2023

Via

Access Paper or Ask Questions

Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

Sep 13, 2022
Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester

Figure 1 for Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

Figure 2 for Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

Figure 3 for Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

Figure 4 for Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction

Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three different strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.

* Accepted to the 2nd Workshop on Recommender Systems for Human Resources (RecSys in HR 2022) as part of RecSys 2022

Via

Access Paper or Ask Questions

JobBERT: Understanding Job Titles through Skills

Sep 20, 2021
Jens-Joris Decorte, Jeroen Van Hautte, Thomas Demeester, Chris Develder

Figure 1 for JobBERT: Understanding Job Titles through Skills

Figure 2 for JobBERT: Understanding Job Titles through Skills

Job titles form a cornerstone of today's human resources (HR) processes. Within online recruitment, they allow candidates to understand the contents of a vacancy at a glance, while internal HR departments use them to organize and structure many of their processes. As job titles are a compact, convenient, and readily available data source, modeling them with high accuracy can greatly benefit many HR tech applications. In this paper, we propose a neural representation model for job titles, by augmenting a pre-trained language model with co-occurrence information from skill labels extracted from vacancies. Our JobBERT method leads to considerable improvements compared to using generic sentence encoders, for the task of job title normalization, for which we release a new evaluation benchmark.

* Accepted to the International workshop on Fair, Effective And Sustainable Talent management using data science (FEAST) as part of ECML-PKDD 2021

Via

Access Paper or Ask Questions

Leveraging the Inherent Hierarchy of Vacancy Titles for Automated Job Ontology Expansion

Apr 06, 2020
Jeroen Van Hautte, Vincent Schelstraete, Mikaël Wornoo

Figure 1 for Leveraging the Inherent Hierarchy of Vacancy Titles for Automated Job Ontology Expansion

Figure 2 for Leveraging the Inherent Hierarchy of Vacancy Titles for Automated Job Ontology Expansion

Figure 3 for Leveraging the Inherent Hierarchy of Vacancy Titles for Automated Job Ontology Expansion

Figure 4 for Leveraging the Inherent Hierarchy of Vacancy Titles for Automated Job Ontology Expansion

Machine learning plays an ever-bigger part in online recruitment, powering intelligent matchmaking and job recommendations across many of the world's largest job platforms. However, the main text is rarely enough to fully understand a job posting: more often than not, much of the required information is condensed into the job title. Several organised efforts have been made to map job titles onto a hand-made knowledge base as to provide this information, but these only cover around 60\% of online vacancies. We introduce a novel, purely data-driven approach towards the detection of new job titles. Our method is conceptually simple, extremely efficient and competitive with traditional NER-based approaches. Although the standalone application of our method does not outperform a finetuned BERT model, it can be applied as a preprocessing step as well, substantially boosting accuracy across several architectures.

* Accepted to the Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020)

Via

Access Paper or Ask Questions

Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Oct 01, 2019
Jeroen Van Hautte, Guy Emerson, Marek Rei

Figure 1 for Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Figure 2 for Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Figure 3 for Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Figure 4 for Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

Word embeddings are an essential component in a wide range of natural language processing applications. However, distributional semantic models are known to struggle when only a small number of context sentences are available. Several methods have been proposed to obtain higher-quality vectors for these words, leveraging both this context information and sometimes the word forms themselves through a hybrid approach. We show that the current tasks do not suffice to evaluate models that use word-form information, as such models can easily leverage word forms in the training data that are related to word forms in the test data. We introduce 3 new tasks, allowing for a more balanced comparison between models. Furthermore, we show that hyperparameters that have largely been ignored in previous work can consistently improve the performance of both baseline and advanced models, achieving a new state of the art on 4 out of 6 tasks.

* Accepted to the Proceedings of the Second Workshop on Deep Learning for Low-Resource NLP (DeepLo 2019)

Via

Access Paper or Ask Questions