Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space

Bandara AMRR

dc.contributor.advisor	Ranathunga L
dc.contributor.advisor	Abdullah N A
dc.contributor.author	Bandara AMRR
dc.date.accessioned	2021
dc.date.available	2021
dc.date.issued	2021
dc.identifier.citation	Bandara, A.M.R.R. (2021). Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space [Doctoral dissertation, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21175
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/21175
dc.description.abstract	The retrieval of temporal digital visual data, either by a text or visual query, requires automatic interpretation, which includes high-level annotation by object detection and recognition for text query-based retrieval and low-level abstraction for visual querybased retrieval. Both the accuracy and the speed of the interpretation become crucial factors in real-world applications, due to the high density of visual data. This study has focused on reducing the complexity of visual data efficiently by dimensionality reduction techniques for the detection and recognition of objects in videos for both textual annotation and visual query-based video frame retrieval. The contribution of the study includes three approaches, i.e., a novel visual feature descriptor based on colour dithering – namely Salient Dither Pattern Feature (SDPF), novel object segmentation method based on the proposed feature descriptor – namely Refining Superpixel and Histogram of oriented optical flow Clustering (RSHC) –, and a novel self-supervised local descriptor – namely Network-in-Network with Restricted Boltzmann Machine (NIN-RBM). The experimental results make it evident that the SDPF is rotation and scale invariant and computationally efficient yet shows similar object recognition accuracy to the state-of-the-art methods with minimum supervision. The results further revealed that RSHC has successfully utilized SDPF for accurately segmenting individual objects by using a very shallow history of motion. Furthermore, according to the results, NIN-RBM has shown the state-of-the-art correspondence matching performance over the existing deep-learned self-supervised binary descriptors, keeping the computation time at the minimum. The overall results support the conclusions that RSHC is capable of accurately segment objects in a video, and then SDPF can be successfully used for recognizing the segmented objects. Moreover, NIN-RBM can be used to reliably and rapidly retrieve video frames related to any visual query. Since NIN-RBM is a local descriptor, it can be further used for locating of high-level objects and estimating their poses precisely, to improve the details of semantics retrieved from video data.	en_US
dc.language.iso	en	en_US
dc.subject	DIMENSIONALITY REDUCTION	en_US
dc.subject	BINARY DESCRIPTOR	en_US
dc.subject	CORRESPONDENCE MATCHING	en_US
dc.subject	OBJECT RECOGNITION	en_US
dc.subject	VIDEO SEGMENTATION	en_US
dc.subject	COLOUR DITHERING	en_US
dc.subject	DEEP LEARNING	en_US
dc.subject	INFORMATION TECHNOLOGY -Dissertation	en_US
dc.subject	COMPUTER SCIENCE -Dissertation	en_US
dc.title	Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space	en_US
dc.type	Thesis-Abstract	en_US
dc.identifier.faculty	IT	en_US
dc.identifier.degree	Doctor of Philosophy	en_US
dc.identifier.department	Department of Information Technology	en_US
dc.date.accept	2021
dc.identifier.accno	TH5063	en_US