A Model based approach for cluster traditional rice varieties of Sri Lanka

dc.contributor.advisorWickramarachchi, N
dc.contributor.authorSilva, MDRL
dc.date.accept2015
dc.date.accessioned2016-10-24T15:10:07Z
dc.date.available2016-10-24T15:10:07Z
dc.description.abstractAs a result of the enormous volume of data produced by highly developed modern techniques, focus on clustering biological data has shown a great interest among biologist to detect the underlying patterns in data since the biological experiment itself has failed to identify the hidden information and divergence patterns exist in data correctly. This study aims to (1) assist clustering biologically similar sequences to detect divergence patterns exist in rice genomic data, by developing a program using the model based clustering algorithm based on Chinese restaurant process which was originally proposed to cluster gene expression data (2) focus on nding the performance of calculating the pairwise distance matrix of rice genome sequences based on the 12-dimensional natural vector of the DNA sequence, as the similarity measure in cluster analysis. The developed program based on the proposed model based clustering method was executed on ALFP pro le data set consisting features of 53 Sri Lankan traditional and wild rice varieties in order to identify the genetic divergence among them. Both a statistical and a biological cluster evaluation were carried out to validate the results obtained. Statistical evaluation was done based on the Bayes ratio to measure the tightness of the clusters formed. Biological evaluation was conducted with the help of the domain experts and research work done by the institute of rice of Sri Lanka. The results showed that the proposed algorithm is capable of identifying highly similar varieties of rice showing their divergence patterns. Finding the performance of how well the natural vector method captures the information encoded in rice genome sequences, 10 rice disease resistance genes which belong to three di erent protein families from Rice genome annotation project database were used. The results showed that the pairwise distance matrix calculated based on 12-dimensional natural vector method gives e cient results compared to traditional proximity matrices. It also revealed that the xed length size sequences (sub sequences) which are not greater than the minimum total length of the selected sequences are also highly capable of capturing the encoded information in total length, regardless of the sub sequence length.en_US
dc.identifier.accno109892en_US
dc.identifier.degreeMPhilen_US
dc.identifier.departmentDepartment of Computer Science & Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.mrt.ac.lk/handle/123/12095
dc.language.isoenen_US
dc.subjectModel-Based clustering, Genetic Diversityen_US
dc.titleA Model based approach for cluster traditional rice varieties of Sri Lankaen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
109892-1.pdf
Size:
1.46 MB
Format:
Adobe Portable Document Format
Description:
Pre Text
Loading...
Thumbnail Image
Name:
109892-2.pdf
Size:
2.24 MB
Format:
Adobe Portable Document Format
Description:
Post Text
Loading...
Thumbnail Image
Name:
109892.pdf
Size:
14.7 MB
Format:
Adobe Portable Document Format
Description:
Full Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: