Gene function prediction using evolutionary k-nearest neighbor algorithm

dc.contributor.advisorPerera, AS
dc.contributor.authorDe Silva, HM
dc.date.accept2017-12
dc.date.accessioned2019-02-07T01:05:30Z
dc.date.available2019-02-07T01:05:30Z
dc.description.abstractHigh-throughput gene annotation data are available in many popular model organism databases and repositories. These data are often incomplete and still evolving while the functions of the genes are unknown or partially known. As the manual curation process is costly and time-consuming, an in-silico method of predicting gene functions became a huge requirement in the industry of bioinformatics. Our approach is to use gene expression data that exist in data repositories rather than sequence data in order to predict the gene functions. In this paper, we have proposed a supervised machine learning algorithm combined with the genetic algorithm for function prediction. The k- Nearest Neighbor Algorithm is optimized using the genetic algorithm to find out the optimum k for a dataset. Also, the genetic algorithm gives a weight vector for the attributes in the dataset making an exceed performance of k- Nearest Neighbor Algorithm. GAKNN is a solution created for gene function prediction which analyze gene annotation data from different repositories and predict gene functions using the genetic algorithm optimized k- Nearest Neighbor classification algorithm. GAKNN provides a workspace for data pre-processing including data cleaning, feature selection, and missing data imputation followed by data analysis and data visualization. The software has been tested over two gene expression datasets from different sources to evaluate the accuracy. The datasets are from two different functional annotation schemes: Gene Ontology and FunCat. The data pre-processing methods available in GAKNN such as missing data imputation also tested with two gene expression datasets and results show that the use of Evolutionary k-Nearest Neighbor Imputation Algorithm gives better results than mean imputation and standard k- Nearest Neighbor Algorithm. The accuracies range from 60%- 88% in GAKNN for function prediction. The weights given for each attribute in the dataset and the optimum k by the genetic algorithm are also graphically represented in GAKNN.en_US
dc.identifier.accnoTH3642en_US
dc.identifier.degreeMSc (Major Component Research)en_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.mrt.ac.lk/handle/123/13896
dc.language.isoenen_US
dc.subjectK- Nearest Neighboren_US
dc.subjectGenetic Algorithmen_US
dc.subjectGene Functionsen_US
dc.subjectGene Annotationen_US
dc.subjectGene Expression Dataen_US
dc.titleGene function prediction using evolutionary k-nearest neighbor algorithmen_US
dc.typeThesis-Full-texten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH3642-1.pdf
Size:
389.14 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH3642-2.pdf
Size:
209.83 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH3642.pdf
Size:
2.04 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis