Abstract:
Metagenomics accelerated the process of studying
different species and their dynamics in multiple environments. A
key step in a metagenomic study is to group nucleotide sequences
belonging to an individual or closely related species which is
often termed binning. Multiple machine learning techniques have
been adopted in binning metagenomic sequences. Specifically,
unsupervised learning is being used in most of the recent
binning methods. This work considers data-driven methods for
binning metagenomic sequences and discusses such approaches
in detail. Furthermore, it explores on increasing the amount of
metagenomic sequences binned while maintaining a reasonable
binning accuracy. Consequently, a dissimilarity-based approach
is proposed to improve the number of contigs binned by an
existing binning method. It is shown to result in a 10% increase in
the number of contigs binned compared to the original approach.
Accordingly, this work suggests that the effective use of observed
data which may be discarded as outliers otherwise, may result
in improved performance in binning.
Citation:
K. Vimukthi, G. Wimalasiri, P. Bandara and D. Herath, "A Data Driven Binning Method to Recover More Nucleotide Sequences of Species in a Metagenome," 2020 Moratuwa Engineering Research Conference (MERCon), 2020, pp. 307-312, doi: 10.1109/MERCon50084.2020.9185388.