Institutional-Repository, University of Moratuwa.  

Gene function prediction using evolutionary k-nearest neighbor algorithm

Show simple item record

dc.contributor.advisor Perera, AS
dc.contributor.author De Silva, HM
dc.date.accessioned 2019-02-07T01:05:30Z
dc.date.available 2019-02-07T01:05:30Z
dc.identifier.uri http://dl.lib.mrt.ac.lk/handle/123/13896
dc.description.abstract High-throughput gene annotation data are available in many popular model organism databases and repositories. These data are often incomplete and still evolving while the functions of the genes are unknown or partially known. As the manual curation process is costly and time-consuming, an in-silico method of predicting gene functions became a huge requirement in the industry of bioinformatics. Our approach is to use gene expression data that exist in data repositories rather than sequence data in order to predict the gene functions. In this paper, we have proposed a supervised machine learning algorithm combined with the genetic algorithm for function prediction. The k- Nearest Neighbor Algorithm is optimized using the genetic algorithm to find out the optimum k for a dataset. Also, the genetic algorithm gives a weight vector for the attributes in the dataset making an exceed performance of k- Nearest Neighbor Algorithm. GAKNN is a solution created for gene function prediction which analyze gene annotation data from different repositories and predict gene functions using the genetic algorithm optimized k- Nearest Neighbor classification algorithm. GAKNN provides a workspace for data pre-processing including data cleaning, feature selection, and missing data imputation followed by data analysis and data visualization. The software has been tested over two gene expression datasets from different sources to evaluate the accuracy. The datasets are from two different functional annotation schemes: Gene Ontology and FunCat. The data pre-processing methods available in GAKNN such as missing data imputation also tested with two gene expression datasets and results show that the use of Evolutionary k-Nearest Neighbor Imputation Algorithm gives better results than mean imputation and standard k- Nearest Neighbor Algorithm. The accuracies range from 60%- 88% in GAKNN for function prediction. The weights given for each attribute in the dataset and the optimum k by the genetic algorithm are also graphically represented in GAKNN. en_US
dc.language.iso en en_US
dc.subject K- Nearest Neighbor en_US
dc.subject Genetic Algorithm en_US
dc.subject Gene Functions en_US
dc.subject Gene Annotation en_US
dc.subject Gene Expression Data en_US
dc.title Gene function prediction using evolutionary k-nearest neighbor algorithm en_US
dc.type Thesis-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree MSc (Major Component Research) en_US
dc.identifier.department Department of Computer Science and Engineering en_US
dc.date.accept 2017-12
dc.identifier.accno TH3642 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record