Multimodal Search Exploration for E-Commerce

Loading...
Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Department of Computer Science and Engineering

Abstract

The landscape of electronic commerce search has been changing drastically in a constant manner with an exponentially growing number of products, user-generated content and complex consumer behaviour patterns [2]. These aspects have made e-commerce search a challenging problem in order to provide accurate, relevant, and personalized search results. The challenges in e-commerce search are complex where the misalignment between visual and textual modalities in multimodal search systems may lead to poor search experiences, especially when users submit detailed, ambiguous or more natural queries [2]. This research addresses these issues by introducing an integrated approach that fuses text and image data within a unified space utilizing the ColPali mechanismwith late interaction for seamless multimodal alignment. The existing search systems are moving towards the conversational search or chatbots which uses the natural queries like “Black Jacket” or “Black Jacket with fleets” and through this multimodal system proposed, the specific and compelling use case of using VLM comes into the effect where a user can upload an image of a specific piece of clothing while asking a query as “Find me some similar jackets in black with fleets and a waterproof lining”. Hence the traditional systems will still struggle to interpret such queries holistically.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By