Abstract:
Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next-generation sequencing and the availability of tool support to automate bioinformatics processes. With this growth, a large amount of biological data gets accumulated at an unprecedented rate, demanding high-performance and high-throughput computing technologies for processing such datasets. Use of hardware accelerators, such as graphics processing units (GPUs) and distributed computing, accelerates the processing of big data in highperformance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput. In this paper, we introduce BioWorkflow, an interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. This paper describes a case study carried out to evaluate the performance of a complex workflow with branching executed by BioWorkflow. The results indicate the gains of ×2.89 magnitude by utilizing GPUs and gains in speed by average ×2.832 magnitude (over n =5 scenarios) by parallel execution of graph nodes during multiple sequence alignment calculations. Combined speed-ups are achieved ×1.71 times for complex workflows. This confirms the expected higher speed-ups when having parallelism through GPU-acceleration and concurrent execution of workflow nodes than the mainstream sequential workflow execution. The tool also provides a comprehensive user interface with better interactivity for managing complex workflows; a system usability scale score of 82.9 is confirmed high usability for the system.
Citation:
Welivita, A., Perera, I., Meedeniya, D., Wickramarachchi, A., & Mallawaarachchi, V. (2018). Managing complex workflows in bioinformatics: An Interactive toolkit with GPU acceleration. IEEE Transactions on NanoBioscience, 17(3), 199–208. https://doi.org/10.1109/TNB.2018.2837122