Abstract:
Chromosomes and plasmids are the major carriers of genetic material in microorganisms such as bacteria. Separating chromosomal and plasmid DNA from large datasets is important as plasmids and chromosomes affect functions and other environmental adaptations. Bioinformatics methodologies have been developed for plasmid classification with the advancements in sequencing technologies. The usage of normalized short k-mer counts with machine learning models has been popular in the characterization of plasmids and chromosomes. Furthermore, bio-markers from DNA sequences as features have also been studied in plasmid classification. However, both approaches suffer from the trade-off between precision and recall. MetaPCbin is a plasmid detection tool that combines computational and genetic approaches into a hybrid method of plasmid prediction. MetaPCbin uses an artificial neural network that uses k-mer counts as features and a random forest model that uses biomarkers. MetaPCbin evaluates the precision and the recall of the classification of real-world DNA sequences from the RefSeq database and simulated sequences. The results show that it is capable of performing plasmid classification while maintaining high precision and recall compared to the state of the art. MetaPCbin is freely available at: https://github.com/MetaGSC/MetaPCbin
Citation:
C. Nandasiri, S. Alahakoon, G. Dassanayake, A. Wickramarachchi and I. Perera, "MetaPCbin: Plasmid/Chromosome Classification for Metagenomic Contigs using Machine Learning Techniques," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906214.