Abstract:
The bag of visual words model has seen immense success in addressing the problem of image classification. Algorithms using this model generate the codebook of visual
words by vector quantizing the features (such as SIFT) of the images to be classified. However, a codebook so formed tends to get biased by the nature of the dataset. In this paper we propose an alternative method to create the codebook for the dataset of images to be classified. Instead of directly using the dataset itself we first create a visual word dictionary by studying the SIFT features of a universal set of images. The codebook
for the images to be classified is derived from this dictionary. To assess the effectiveness of the codebook thus derived, we classify the images using Probabilistic Latent Semantic Analysis in an unsupervised setting and Naive Bayes’ classification in
a supervised setting. The use of a dictionary achieves results comparable to those obtained via a codebook formed from the dataset itself in much less computational time. We also use the dictionary to demonstrate how analogies can be drawn between
visual words and linguistic words and present an analysis on one such analogy—that of polysemy.