Gene function prediction using evolutionary k-nearest neighbor algorithm

De Silva, HM

Gene function prediction using evolutionary k-nearest neighbor algorithm

dc.contributor.advisor	Perera, AS
dc.contributor.author	De Silva, HM
dc.date.accept	2017-12
dc.date.accessioned	2019-02-07T01:05:30Z
dc.date.available	2019-02-07T01:05:30Z
dc.description.abstract	High-throughput gene annotation data are available in many popular model organism databases and repositories. These data are often incomplete and still evolving while the functions of the genes are unknown or partially known. As the manual curation process is costly and time-consuming, an in-silico method of predicting gene functions became a huge requirement in the industry of bioinformatics. Our approach is to use gene expression data that exist in data repositories rather than sequence data in order to predict the gene functions. In this paper, we have proposed a supervised machine learning algorithm combined with the genetic algorithm for function prediction. The k- Nearest Neighbor Algorithm is optimized using the genetic algorithm to find out the optimum k for a dataset. Also, the genetic algorithm gives a weight vector for the attributes in the dataset making an exceed performance of k- Nearest Neighbor Algorithm. GAKNN is a solution created for gene function prediction which analyze gene annotation data from different repositories and predict gene functions using the genetic algorithm optimized k- Nearest Neighbor classification algorithm. GAKNN provides a workspace for data pre-processing including data cleaning, feature selection, and missing data imputation followed by data analysis and data visualization. The software has been tested over two gene expression datasets from different sources to evaluate the accuracy. The datasets are from two different functional annotation schemes: Gene Ontology and FunCat. The data pre-processing methods available in GAKNN such as missing data imputation also tested with two gene expression datasets and results show that the use of Evolutionary k-Nearest Neighbor Imputation Algorithm gives better results than mean imputation and standard k- Nearest Neighbor Algorithm. The accuracies range from 60%- 88% in GAKNN for function prediction. The weights given for each attribute in the dataset and the optimum k by the genetic algorithm are also graphically represented in GAKNN.	en_US
dc.identifier.accno	TH3642	en_US
dc.identifier.degree	MSc (Major Component Research)	en_US
dc.identifier.department	Department of Computer Science and Engineering	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.uri	http://dl.lib.mrt.ac.lk/handle/123/13896
dc.language.iso	en	en_US
dc.subject	K- Nearest Neighbor	en_US
dc.subject	Genetic Algorithm	en_US
dc.subject	Gene Functions	en_US
dc.subject	Gene Annotation	en_US
dc.subject	Gene Expression Data	en_US
dc.title	Gene function prediction using evolutionary k-nearest neighbor algorithm	en_US
dc.type	Thesis-Full-text	en_US

Files

Original bundle

Now showing 1 - 3 of 3

Name:: TH3642-1.pdf
Size:: 389.14 KB
Format:: Adobe Portable Document Format
Description:: Pre-text

Download

Name:: TH3642-2.pdf
Size:: 209.83 KB
Format:: Adobe Portable Document Format
Description:: Post-text

Download

Name:: TH3642.pdf
Size:: 2.04 MB
Format:: Adobe Portable Document Format
Description:: Full-thesis

Download

Collections

Master of Science By Research