Institutional-Repository, University of Moratuwa.  

A Heterogeneous data ensemble approach for protein function prediction under mitochondrion organization

Show simple item record

dc.contributor.advisor Perera, AS Sumanaweera, DN 2017-02-15T06:10:32Z 2017-02-15T06:10:32Z
dc.description.abstract A heterogeneous data ensemble approach for the classification of Saccharomyces cerevisiae proteins under ‘mitochondrion organization’ Proteins are the real role players in keeping a cell healthy and well functioning. An important group of proteins is the subset of mitochondrial proteins that engage in the assembly, arrangement and disassembly of the mitochondrion. Several of them have been identified to cause human diseases. Hence, annotating proteins under the ‘mitochondrion organization’ Biology process is vital for identifying disease causative factors and for designing therapeutics. As manual annotation requires costly and laborious in vitro methods, in silico function prediction is preferred nowadays. Recent studies identify the importance of incorporating data from various biological aspects, to formulate a strong functional context for classification. In addition, many approaches from literature employ ensemble classifiers to attain a higher prediction accuracy. However, an insightful approach for accurate classification; biological data utilization; and biological data type significance determination; is still in need. This study presents an assessment of a heterogeneous data ensemble to classify Saccharomyces cerevisiae proteins under ‘mitochondrion organization’. The ensemble consists of nine euclidean-distance based nearest neighbour models and three affinity-based neighbourhood models; it utilizes sequences, protein domains, peptide chain properties, gene expression, secondary structure and interactions. The base models were trained upon annotations from the Gene Ontology, as well as from a publicly available benchmark gold dataset. They show a substantial level of disagreement, implying their effectiveness in collective decision making. Six combination schemes were evaluated for fusing the base model outputs. A Genetic Algorithmically weighted ensemble gives the highest improvement to the best performing base classifier, by displaying an average area under the Receiver Operating Characteristic curve of 92.52%. Moreover, it is capable of determining the biological importance of each data type. Overall, the proposed heterogeneous data ensemble is capable of identifying eight disease related proteins and one disease related protein in a strong and moderate sense, respectively. en_US
dc.language.iso en en_US
dc.subject yeast en_US
dc.subject proteins
dc.subject mitochondrion
dc.subject weighted ensemble
dc.subject data heterogeneity
dc.title A Heterogeneous data ensemble approach for protein function prediction under mitochondrion organization en_US
dc.type Thesis-Full-text en_US
dc.identifier.faculty Engineering en_US MSc (Major Component Research) en_US
dc.identifier.department Department of Computer Science & Engineering en_US 2016
dc.identifier.accno TH3257 en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record