Abstract:
One of the major problems in the Sri Lankan labour market is the lack of availability of demand side information. This lack of information has created a gap between supply and demand of labour. Job advertisements provide a wide range of real-time information about aspects, such as skills and qualifications, that are in demand, though this information is largely unstructured and exists in many different formats. The objective of this research is to create a structured dataset of job vacancies in Sri Lanka using publicly available job advertisements. A total of 3500 images of job advertisements were scraped from Sri Lankan English newspapers and job websites and converted into text form using Optical Character Recognition (OCR). Next, a structured dataset was created by extracting information, applying a rule-based approach in the Natural Language Processing (NLP) domain, after which some basic insights on the labour market were derived. The creation of this kind of dataset could provide huge value to employers, job seekers and policymakers, providing up-to-date information on the skills and qualifications required in the job market.