A data driven binning method to recover more nucleotide sequences of species in a metagenome

Loading...
Thumbnail Image

Date

2020-07

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Metagenomics accelerated the process of studying different species and their dynamics in multiple environments. A key step in a metagenomic study is to group nucleotide sequences belonging to an individual or closely related species which is often termed binning. Multiple machine learning techniques have been adopted in binning metagenomic sequences. Specifically, unsupervised learning is being used in most of the recent binning methods. This work considers data-driven methods for binning metagenomic sequences and discusses such approaches in detail. Furthermore, it explores on increasing the amount of metagenomic sequences binned while maintaining a reasonable binning accuracy. Consequently, a dissimilarity-based approach is proposed to improve the number of contigs binned by an existing binning method. It is shown to result in a 10% increase in the number of contigs binned compared to the original approach. Accordingly, this work suggests that the effective use of observed data which may be discarded as outliers otherwise, may result in improved performance in binning.

Description

Keywords

Metagenomics, binning, data driven, mahalanobis distance measure

Citation

K. Vimukthi, G. Wimalasiri, P. Bandara and D. Herath, "A Data Driven Binning Method to Recover More Nucleotide Sequences of Species in a Metagenome," 2020 Moratuwa Engineering Research Conference (MERCon), 2020, pp. 307-312, doi: 10.1109/MERCon50084.2020.9185388.

Collections