Information extraction from Sri Lankan job advertisements via rule-based approach

dc.contributor.authorBandara, RMHD
dc.contributor.authorGunasekara, HASS
dc.contributor.authorPeiris, WADS
dc.contributor.authorWijekoon, WMHC
dc.contributor.authorDe Silva, TS
dc.contributor.authorHewawalpita, SGS
dc.contributor.authorRathnayake, HMSC
dc.date.accessioned2021-12-07T06:24:30Z
dc.date.available2021-12-07T06:24:30Z
dc.date.issued2021-12-03
dc.description.abstractOne of the major problems in the Sri Lankan labour market is the lack of availability of demand side information. This lack of information has created a gap between supply and demand of labour. Job advertisements provide a wide range of real-time information about aspects, such as skills and qualifications, that are in demand, though this information is largely unstructured and exists in many different formats. The objective of this research is to create a structured dataset of job vacancies in Sri Lanka using publicly available job advertisements. A total of 3500 images of job advertisements were scraped from Sri Lankan English newspapers and job websites and converted into text form using Optical Character Recognition (OCR). Next, a structured dataset was created by extracting information, applying a rule-based approach in the Natural Language Processing (NLP) domain, after which some basic insights on the labour market were derived. The creation of this kind of dataset could provide huge value to employers, job seekers and policymakers, providing up-to-date information on the skills and qualifications required in the job market.en_US
dc.identifier.conferenceInternational Conference on Business Researchen_US
dc.identifier.emailharini.17@business.mrt.ac.lken_US
dc.identifier.emailsuwani.17@business.mrt.ac.lken_US
dc.identifier.emaildiluni.17@business.mrt.ac.lken_US
dc.identifier.emailhimali.17@business.mrt.ac.lken_US
dc.identifier.emailtilokad@uom.lken_US
dc.identifier.emailsupungs@uom.lken_US
dc.identifier.emailsamadhic@uom.lken_US
dc.identifier.facultyBusinessen_US
dc.identifier.pgnospp. 143-152en_US
dc.identifier.placeMoratuwaen_US
dc.identifier.proceeding4th International Conference on Business Research - ICBR 2021en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/16859
dc.identifier.year2021en_US
dc.language.isoenen_US
dc.publisherBusiness Research Unit (BRU)
dc.subjectNLPen_US
dc.subjectOCRen_US
dc.subjectInformation Extractionen_US
dc.subjectJob advertisementsen_US
dc.subjectLabour market intelligenceen_US
dc.titleInformation extraction from Sri Lankan job advertisements via rule-based approachen_US
dc.typeConference-Full-texten_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
15. 2021-01-131.pdf
Size:
842.79 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections