Siamese networks for multilingual classified ad matching
Loading...
Files
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
This paper presents a novel approach to semantically match ”Resource Wanted” and ”Resource Offering” classified ads within Sri Lanka’s complex multilingual digital marketplace. We introduce a Siamese neural network architecture specifically designed to effectively process both textual content and categorical metadata across English and Sinhala languages. Our model leverages advanced multilingual transformer models to create semantically rich embeddings, with a LaBSEbased implementation achieving superior performance, reaching a Recall@1 of 0.5813 and a Recall@10 of 0.9151. Crucially, the integration of categorical features with text embeddings yielded the best results, demonstrating a 1.5% improvement in Recall@1over the text-only approach. Our methodology addresses the significant challenge of matching ads across linguistic boundaries in a low-resource setting, providing a method that can significantly improve transaction efficiency in Sri Lanka’s diverse digital marketplace.
