A Deep learning based approach for simultaneous host extraction and multi-class classification
Loading...
Files
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Understanding microorganisms’ behavioral patterns and functions in different environments is crucial to clarify their impact on human health, environmental sustainability, etc. Metagenomics which is facilitated by evolved computational techniques, analyzes genetic material directly from environmental samples. It overcomes the need for culturing individual organisms. However, it poses computational challenges due to heterogeneous datasets. One major problem is host DNA overshadowing microbial DNA, affecting downstream analysis quality. In addition, existing tools are often optimized for specific microorganisms, necessitating multi-class classification tools. The proposed tool is a CNN-based approach that addresses these challenges by separating host sequences and classifying microbial samples into five classes: bacteria, fungi, archaea, protozoa, and viruses. It also allows users to fine-tune the model with a new host, if needed, and optimize host extraction. The proposed tool has outperformed past literature, as evidenced by our evaluation results.
