Efficiently run a deep learning model on a distributed CPU system

Gnanasena, DDKK

Efficiently run a deep learning model on a distributed CPU system

dc.contributor.advisor	Fernando, S
dc.contributor.author	Gnanasena, DDKK
dc.date.accept	2024
dc.date.accessioned	2025-09-26T09:49:19Z
dc.date.issued	2024
dc.description.abstract	In Distributed Deep Learning, Data transfer between Computer equipment might act as a bottleneck that prevents the system from growing to its full capacity. To overcome this, people have come with methods like Quantization and Finetuning which work well only on GPUs and TPUs which are costly infrastructure. However, if we could suffice from running deep learning models on CPU based Distributed System, it is a great edge in cost reduction. So, in this research such a system is created and evaluated to find out whether they can compete with GPUs using MultiWorkerMirroredStrategy. MNIST dataset on a linear model, CIFAR10 dataset on a Convolution Neural Network (CNN) model and finally a quantized Llama 2 model was run to summarize and translate an essay. They were evaluated by their accuracy and time taken to train against running them in a Google Colab T4 GPU(For Llama model, BertScore and Rouge Score for summarization and BertScore and BleuScore for Translation was used). Performance of MNIST data concluded, this system is inefficient for small models as the time taken to train on two machines was 200 times of a single machine. When running CIFAR10 dataset CNN accuracy decreases over the number of machines on multi-node and the increasing batch sizes and time taken decreases and increased after a certain batch size showing the bottleneck of communication. In Llama model it was observed that with increasing dataset size the time taken also reduces when compared with GPU time taken. None of the models were fast enough to pass the time taken for the GPU in all cases but some accuracies in Llama model were able to pass GPU accuracies. For future works Model parallelism should also be tested for further development of this CPU Distributed system.
dc.identifier.accno	TH5765
dc.identifier.citation	Gnanasena, D.D.K.K. (2024). Efficiently run a deep learning model on a distributed CPU system [Master’s theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24230
dc.identifier.degree	MSc in Artificial Intelligence
dc.identifier.department	Department of Computational Mathematics
dc.identifier.faculty	IT
dc.identifier.uri	https://dl.lib.uom.lk/handle/123/24230
dc.language.iso	en
dc.subject	MULTI WORKER MIRRORED STRATGY
dc.subject	DISTRIBUTED DEEP LEARNING
dc.subject	GPU MEMORY
dc.subject	ARTIFICIAL INTELLIGENCE-Dissertation
dc.subject	COMPUTATIONAL MATHEMATICS-Dissertation
dc.subject	MSc in Artificial Intelligence
dc.title	Efficiently run a deep learning model on a distributed CPU system
dc.type	Thesis-Full-text

Files

Original bundle

Now showing 1 - 3 of 3

Name:: TH5765-1.pdf
Size:: 175.98 KB
Format:: Adobe Portable Document Format
Description:: Pre-text

Download

Name:: TH5765-2.pdf
Size:: 83.25 KB
Format:: Adobe Portable Document Format
Description:: Post-text

Download

Name:: TH5765.pdf
Size:: 2.1 MB
Format:: Adobe Portable Document Format
Description:: Full-thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Master of Science in Artificial Intelligence