Accelerating K-MER counting for genomic analysis
dc.contributor.advisor | Jayasena S | |
dc.contributor.author | Brihadiswaran G | |
dc.date.accept | 2021 | |
dc.date.accessioned | 2021T03:23:16Z | |
dc.date.available | 2021T03:23:16Z | |
dc.date.issued | 2021 | |
dc.description.abstract | A-mer counting is the process of counting k length substrings in a sequence. It is an important step in many bioinformatics applications including genome assembly, sequence error correction, and sequence alignment. Even though generating A-mer histograms seems simple and straightforward, processing large datasets efficiently with limited resources, especially memory, is very challenging. As the advancements in next-generation sequencing technologies have resulted in a tremendous growth of genomic data, it is inevitable for /r-mer counters to be faster and more efficient. A lot of work has been done in the past decade to optimize A-mer counting. Frigate, a fast and efficient tool capable of counting and querying A-mers is presented. Its inmemory design utilizes multithreaded, lock-free data structures to improve performance. Thread synchronization is handled using the compare-and-swap technique. The parallel processing pipeline of Frigate is the result of careful performance engineering and design. Frigate was developed with the emphasis on values of k less than 20, aiming to maximize performance by employing different algorithms for different ranges of k values. The performance of Frigate was compared with six state-of-the-art A-mer counters: Jellyfish, DSK, Gerbil, CHTKC, KMC2, and KMC3, using two real-world datasets. The experiments were carried out for k values of 10, 15, and 17 using a different number of threads in the range [1, 32]. The results show that Frigate achieves a comparable performance or up to 2-3x speedup compared to its competitors, especially for large datasets. The A-mer counters were analyzed based on the running time, amount of memory used, and scalability. The correctness of Frigate was evaluated by comparing the A-mer frequency histogram with those of other A-mer counters. Frigate is written in C and freely available at https: github.com Gunavaran, frigate under MIT license. | en_US |
dc.identifier.accno | TH5106 | en_US |
dc.identifier.citation | Brihadiswaran G. (2021). Accelerating K-MER counting for genomic analysis [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22662 | |
dc.identifier.degree | Master of Science (Major Component of Research) | en_US |
dc.identifier.department | Department of Computer Science & Engineering | en_US |
dc.identifier.faculty | Engineering | en_US |
dc.identifier.uri | http://dl.lib.uom.lk/handle/123/22662 | |
dc.language.iso | en | en_US |
dc.subject | K-MER COUNTING | |
dc.subject | GENOME ANALYSIS | |
dc.subject | PERFORMANCE ENGINEERING | |
dc.subject | PARALLEL CPMPUTING | |
dc.subject | COMPUTER SCIENCE - Dissertation | |
dc.subject | COMPUTER SCIENCE AND ENGINEERING - Dissertation | |
dc.subject | MSc (Major Component Research) | |
dc.title | Accelerating K-MER counting for genomic analysis | en_US |
dc.type | Thesis-Abstract | en_US |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- TH5106-1.pdf
- Size:
- 346.29 KB
- Format:
- Adobe Portable Document Format
- Description:
- Pre-text
Loading...
- Name:
- TH5106-2.pdf
- Size:
- 204.58 KB
- Format:
- Adobe Portable Document Format
- Description:
- Post-text
Loading...
- Name:
- TH5106.pdf
- Size:
- 2.56 MB
- Format:
- Adobe Portable Document Format
- Description:
- Full-thesis
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: